<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>An Introduction to the theory of statistics</title>
        <author>
          <persName>
            <forname>George Udny</forname>
            <surname>Yule</surname>
          </persName>
        </author>
      </titleStmt>
      <publicationStmt />
      <sourceDesc>
        <bibl>
          <msIdentifier>
            <idno>1751730271</idno>
          </msIdentifier>
        </bibl>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <div>
        <pb n="1" />
        <pb n="2" />
        EIGENTUM
DES |
INSTITUTS
WELTWIRTSCHAFT

en nIEL
BIBLIOTHEK :
Nr 1 231772
        <pb n="3" />
        <pb n="4" />
        <pb n="5" />
        AN INTRODUCTION TO THE
THEORY OF STATISTICS.
        <pb n="6" />
        - = -

Gharles Griffin &amp; Co., Ltd., Publishers

Note.—All prices are net, postage extra.

BIOMATHEMATICS. Being the Principles of Mathe-
matics for Students of Biological Science, By W. M. FrLpmax, M.D.,
B.S.(Lond.), F.R.S.(Edin.). In Cloth. Pp. i-xix+398. With 125
Diagrams. 21s.

‘“ An excellent introduction, and worthy of great praise.”—Edin. Med. Jowrn.
TARIFFS: A Study in Method. By T. E. G.

Gregory, D.Sc. (Econ.), London. In Demy 8vo. Cloth. Pp. i-xv+
518. 25s.

‘¢ Will be of special interest to business men and all interested in current economic
problems.”-—Chamber of Commerce Journal.

“It ought to be in the library of every legislature throughout the world.”— New
Statesman.

MEASUREMENT CONVERSIONS (English
and French). 43 Graphic Tables or Diagrams, on 28 Plates, show-
ing at a glance the Mutual Conversion of Measurements in Different Units
of Lengths, Areas, Volumes, Weights, Stresses, Densities, Quantities of
Work, Horse-Power, Temperature, etc. By Prof. RoBeErT H. SMITH,
A.M.Inst.C.E., M.I.Mech.E. In Quarto. Boards. 7s. 6d.

“A work which should prove invaluable to Engineers, Surveyors, Architects, Con-
tractors, ete.”

TEXT-BOOK OF MATHEMATICS AND
MECHANICS. Specially arranged for the Use of Students Qualify-
ing for Science and Technical Examinations. By A. A. Capito,
M.Sc. Eng. (Hafnia), Second Edition. In Large Crown 8vo. Cloth.
In Two Volumes. Vol. I.L—ANALYTICAL GEOMETRY. Pp. i-xii+ 169.
7s. 6d. Vol. IL. —MEcHANICS. Pp. i-xii+ 170-398. 7s. 6d.

“The expository power of the author is considerable, and the engineering student,
who has carefully gone through the book will have nothing to unlearn.”—Mathematical
Gazette.

THE OFFICIAL YEAR-BOOK OF THE
SCIENTIFIC AND LEARNED SOCIETIES OF GREAT
BRITAIN AND IRELAND: compiled from Official
Sources.

Comprising (together with other Official Information) Lists of the Papers read dur-
ing the Session 1925-1926 before all the leading Societies throughout the Kingdom
engaged in the following Departments of Research: § 1. Science Generally: i.e.,
Societies occupying themselves with several Branches of Science, or with Science
and Literature jointly. §2. Astronomy, Meteorology, Mathematics and Physics.
§ 3. Chemistry and Photography. §4. Geology, Geography, and Mineralogy.
§ 5. Biology, including Botany, Microscopy, and Anthropology. § 6. Economic
Science and Statistics. §7. Mechanical Science, Engineering, and Architecture.
§ 8. Naval and Military Science. §9. Agriculture and Horticulture. § 10. Law.
§ 11. Literature, History, and Music. § 12. Psychology. § 13. Archzology.
§ 14. Medicine.

42nd Issue. Strongly Bound in Cloth. 48s. (Other volumes are
available.)

‘“A veritable mine of reference . . . there is no other publication that can in
any sense be substituted for it.”—Colliery Guardian.

LoxpoN : CHAS. GRIFFIN &amp; CO., LTD., 42 Drury Lang, W.C. 2
ee EE ET SS ES LS SE ET LS Si A I A ST A ES EN TS CA ESSE IIOP || == me _
        <pb n="7" />
        AN INTRODUCTION TO THE
THEORY OF STATISTICS
G. UDNY (YULE, C.B.E, M.A, FRS,
FELLOW OF ST. JOHN'S COLLEGE, AND
UNIVERSITY LECTURER IN STATISTICS, CAMBRIDGE ;
HONORARY VICE-PRESIDENT OF THE ROYAL STATISTICAL
SOCIETY OF LONDON;

HONORARY MEMBER OF THE AMERICAN STATISTICAL ASSOCIATION;
MEMBER OF THE INTERNATIONAL STATISTICAL INSTITUTE ;
FELLOW OF THE ROYAL ANTHROPOLOGICAL INSTITUTE.

With 53 Figures and Diagrams,
ea

- TT
EIGHTH EDITION, REVISED.
LONDON:
CHARLES GRIFFIN AND COMPANY, LIMITED,
42 DRURY LANE, W.C.2
1927.
[All Rights Reserved.}

BY
        <pb n="8" />
        ,

Twn Ns md Sf
. rv CO
" ny EZ
. ©
x

PH V
Printed in Great Britain by

NEILL &amp; Co., LTp., EDINBURGH,
        <pb n="9" />
        PREFACE TO THE EIGHTH EDITION.
————
TrAT a new edition should be called for within three years of
the issue of the last is satisfactory evidence that the Introduction
to the Theory of Statistics, which has now been before the public
for over fifteen years, continues to hold its own amongst its now
numerous competitors.

In the present edition an additional supplement has been in-
cluded directing attention to the vagaries of observers in reading
a scale, a curious subject on which a more extended note will
shortly be published in the Journal of the Royal Statistical
Society, and giving an example of the interesting frequency
distributions presented by sizes of genera in any biological group.
The lists of references have been brought up to date as usual and
all new matter incorporated in the index.

A Czech copyright edition was agreed and the translation
made by Dr Vladimir Novak, Professor of Physics at the Czech
Technical High School at Brno, and Dr Jos. Mraz, Ministerial
Councillor of the State Statistical Office and Docent of Statistics
at the Czech Technical High Schoel, Prague, and this appeared
towards the close of last year, published by the State Statistical
Office. That the State Statistical Office should have undertaken
the publication of a work on pure theory shows a very note-
worthy breadth of view as to the functions of such an Office :
that the authorities should have selected the present work for
translation was equally gratifying to the author and the pub-
lishers. The author expresses his personal indebtedness to
Dr Mraz, and thanks him for the care with which he read the
text, and called attention to a number of minor points on which
correction has been duly made.

G.. 1. ¥.

February 1927.
        <pb n="10" />
        <pb n="11" />
        itn Ars
-
hak
PREFACE TO THE FIRST EDITION.

Tae following chapters are based on the courses of instruction
given during my tenure of the Newmarch Lectureship in Statistics
at University College, London, in the sessions 1902-1909. The
variety of illustrations and examples has, however, been increased

to render the book more suitable for the use of biologists and

others besides those interested in economic and vital statistics,

and some of the more difficult parts of the subject have been

treated in greater detail than was possible in a sessional course

of some thirty lectures. For the rest, the chapters follow closely

the arrangement of the course, the three parts into which the

volume is divided corresponding approximately to the work of
the three terms. To enable the student to proceed further with
the subject, fairly detailed lists of references to the original
memoirs have been given at the end of each chapter: exercises
have also been added for the benefit, more especially, of the
student who is working without the assistance of a teacher.

The volume represents an attempt to work out a systematic
introductory course on statistical methods—the methods available
for discussing, as distinct from collecting, statistical data—suited
to those who possess only a limited knowledge of mathematics :
an acquaintance with algebra up to the binomial theorem,
together with such elements of co-ordinate geometry as are now
generally included therewith, is all that is assumed. I hope that
it may prove of some service to the students of the diverse
sciences in which statistical methods are now employed.
My most grateful thanks are due to Mr R. H. Hooker not only
v1l
        <pb n="12" />
        v: PREFACE.

for reading the greater part of the manuscript, and the proofs,
and for making many criticisms and suggestions which have
been of the greatest service, but also for much friendly help and
encouragement without which the preparation of the volume,
often delayed and interrupted by the pressure of other work,
might never have been completed: my debt to Mr Hooker is
indeed greater than can well be expressed in a formal preface.
My thanks are also due to Mr H. D. Vigor for some assistance
in checking the arithmetic, and my acknowledgments to Professor
Edgeworth for the example used in §5 of Chap. XVII. to illustrate
the influence of the form of the frequency distribution on the
probable error of the median.

I can hardly hope that all errors in the text or in the mass
of arithmetic involved in examples and exercises have been
eliminated, and will feel indebted to any reader who directs
my attention to any such mistakes, or to any omissions, am-
biguities, or obscurities.

3. U.'Y.

December 1910,

ily
        <pb n="13" />
        INTRODUCTION.
PAGES

1-3 The introduction of the terms ‘* statistics,” *‘ statistical,” into
the English language—4-6. The change in meaning of these
terms during the nineteenth century—7-9. The present use
of the terms—10. Definitions of “statistics,” statistical
methods,” “‘ theory of statistics,” in accordance with present
usage 8
PART I.-THE THEORY OF ATTRIBUTES

CHAPTER 1.
NOTATION AND TERMINOLOGY.

1-2. Statistics of attributes and statistics of variables : fundamental
character of the former—3-5. Classification by dichotomy—
6-7. Notation for single attributes and for combinations—
8. The class-frequency—9. Positive and negative attributes,
contraries—10. The order of a class—11. The aggregate—
12. The arrangement of classes by order and aggregate—
13-14. Sufficiency of the tabulation of the ultimate class-
frequencies—15-17. Or, better, of the positive class-fre-
quencies—18. The class-frequencies chosen in the census
for tabulation of statistics of infirmities —19. Inclusive and
exclusive notations and terminologies . !

CHAPTER II
CONSISTENCE.

1-3. The field of observation or universe, and its specification by
symbols—4. Derivation of complex from simple relations by
specifying the universe — 5-6. Consistence — 7-10. Con-
ditions of consistence for one and for two attributes—

11-14. Conditions of consistence for three attributes .

1-1
7-16
17-24
IX
        <pb n="14" />
        CONTENTS.
CHAPTER IIL
ASSOCIATION.
3
1-4. The criterion of independence — 5-10. The conception of
association, and testing for the same by the comparison
of percentages—11-12. Numerical equality of the differences
between the four second-order frequencies and their in-
dependence values—13. Coefficients of association —14.
Necessity for an investigation into the causation of an
attribute 4 being extended to include non-A’s . , 25-41
CHAPTER IV.
PARTIAL ASSOCIATION.
1-2. Uncertainty in interpretation of an observed association—3-5.
Source of the ambiguity : partial associations—6-8. Illusory
association due to the association of each of two attributes
with a third—9. Estimation of the partial associations from
the frequencies of the second order—10-12. The total
number of associations for a given number of attributes—
13-14. The case of complete independence . . . 0
CHAPTER V.
MANIFOLD CLASSIFICATION.
1. The general principle of a manifold classification—2-4. The
table of double entry or contingency table and its treatment
by fundamental methods—5-8. The coefficient of contin-
gency—9-10. analysis of a contingency table by tetrads
211-13. Isotropic and anisotropic distributions —14-15.
Homogeneity of the classifications dealt with in the pre-
ceding chapters : heterogeneous classifications . - . 60-74
PART IIL—THE THEORY OF VARIABLES.
CHAPTER VL
THE FREQUENCY-DISTRIBUTION.
1. Introductory—2. Necessity for classification of observations : the
frequency-distribution—3. Tllustrations—4. Method of form-
ing the table—5. Magnitude of class-intervals—6. Position
of intervals—7. Process of classification—8. Treatment of
intermediate observations—9. Tabulation—10. Tables with
unequal intervals —11. Graphical representation of the
frequency-distribution—12. Ideal frequency-distributions—
13. The symmetrical distribution—14. The moderately
asymmetrical distribution—15. The extremely asymmetri-
cal or J-shaped distribution—16. The U-shaped distribution ~~ 756-105

PAGE
42-5¢
        <pb n="15" />
        CONTENTS.
CHAPTER VIL
AVERAGES,
18
1. Necessity for quantitative definition of the characters of a
frequency-distribution—2. Measures of position (averages)
and of dispersion—3. The dimensions of an average the
same as those of the variable—4. Desirable properties for
an average to possess—5. The commoner forms of average—
6-13. The arithmetic mean : its definition, calculation, and
simpler properties—14-18. The median: its definition,
calculation, and simpler properties—19-20. The mode : its
definition and relation to mean and median—21. Summary
comparison of the preceding forms of average—22-26. The
geometric mean : its definition, simpler properties, and the
cases in which it is specially applicable—27. The harmonic
mean : its definition and calculation . . 106-132
CHAPTER VIIL
MEASURES OF DISPERSION, ETC,
1. Inadequacy of the range as a measure of dispersion—
2-13. The standard deviation : its definition, calculation,
and properties—14-19. The mean deviation : its definition,
calculation, and properties—20-24. The quartile deviation
or semi-interquartile range—25. Measures of relative dis-
genion 2 Measures of asymmetry or skewness—27-30.
he method of grades or percentiles . : . 133-156
CHAPTER IX.
CORRELATION.
1-3. The correlation table and its formation—4-5. The correlation
surface—6-7. The general problem—=8-9. The line of means
of rows and the line of means of columns: their relative
positions in the case of independence and of varying degrees
of correlation—10-14. The correlation-coefficient, the re-
gressions, and the standard-deviations of arrays—15-16,
Numerical caleulations— 17. Certain points to be re-
membered in calculating and using the coefficient . . 157-190
CHAPTER X,
CORRELATION: ILLUSTRATIONS AND PRACTICAL
METHODS.
1. Necessity for careful choice of variables before proceeding to
calculate 7—2-8. Illustration i.: Causation of pauperism—
9-10. Illustration ii.: Inheritance of fertility—11-13.

X1
PAGE
        <pb n="16" />
        CONTENTS.
9
Illustration iii.: The weather and the crops—14. Corre-
lation between the movements of two variables: (a)
Non-periodic movements: Illustration iv.: changes in
infantile and general mortality—15-17. (b) Quasi-periodic
movements : Illustration v.: the marriage-rate and foreign
trade—18. Elementary methods of dealing with cases of
non-linear regression—19. Certain rough methods ofapproxi-
mating to the correlation-coefficient—20-22, The correla-
tion-ratio . . : . . 191-209
CHAPTER XI.
MISCELLANEOUS THEOREMS INVOLVING THE USE OF
THE CORRELATION-COEFFICIENT.
1. Introductory—2. Standard-deviation of a sum or difference—
3-5. Influence of errors of observation and of grouping on the
standard-deviation—6-7. Influence of errors of observation
on the correlation-coefficient (Spearman’s theorems) — 8.
Mean and standard-deviation of an index-—9. Correlation
between indices—10. Correlation-coefficient for a two x two-
fold table—11. Correlation-coefficient for all possible pairs of
NN values of a variable—12. Correlation due to heterogeneity
of material —18. Reduction of correlation due to mingling
of uncorrelated with correlated material — 14-17. The
weighted mean—18-19. Application of weighting to the
correction of death-rates, etc., for varying sex and age-
distributions—20. The weighting of forms of average other
than the arithmetic mean . . 210-228
CHAPTER XII.
PARTIAL CORRELATION.
1-2. Introductory explanation—3. Direct deduction of the formule
for two variables —4. Special notation for the general
case : generalised regressions—5. Generalised correlations—
6. Generalised deviations and standard - deviations —
7-8. Theorems concerning the generalised product-sums—
9. Direct interpretation of the generalised regressions—
10-11. Reduction of the generalised standard-deviation—
12. Reduction of the generalised regression—13. Reduction
of the generalised correlation-coefficient—14. Arithmetical
work : Example i. ; Example ii.—15. Geometrical repre-
sentation of correlation between three variables by means of
a model —16. The coefficient of n-fold correlation—17. Ex-
pression of regressions and correlations of lower in terms of
those of higher order—18. Limiting inequalities between
the values of correlation-coefficients necessary for consist-
ence—19. Fallacies . . . 229-253

X1i
PAGE
        <pb n="17" />
        CONTENTS.

PART IIL-THEORY OF SAMPLING.
CHAPTER XIIL
SIMPLE SAMPLING OF ATTRIBUTES.

™ Ss
1. The problem of the present Part—2. The two chief divisions of
the theory of sampling—3. Limitation of the discussion to
the case of simple sampling—4. Definition of the chance of
success or failure of a given event—5. Determination of the
mean and standard-deviation of the number of successes in
n events—6. The same for the proportion of successes in n
events : the standard-deviation of simple sampling as a
measure of unreliability or its reciprocal as a measure of
precision—7. Verification of the theoretical results by ex-
periment—8. More detailed discussion of the assumptions
on which the formula for the standard-deviation of simple
sampling is based—9-10. Biological cases to which the
theory is directly applicable—11, Standard-deviation of
simple sampling when the numbers of observations in the
samples vary—12. Approximate value of the standard-
deviation of simple sampling, and relation between mean
and standard-deviation, when the chance of success or
failure is very small —13. Use of the standard-deviation of
simple sampling, or standard error, for checking and con-

trolling the interpretation of statistical results . . 254-275

CHAPTER XIV.
SIMPLE SAMPLING CONTINUED: EFFECT OF REMOV.
ING THE LIMITATIONS OF SIMPLE SAMPLING.

1. Warning as to the assumption that three times the standard
error gives the range for the majority of fluctuations of’
simple sampling of either sign—2. Warning as to the use
of the observed for the true value of p in the formula for
the standard error—3. The inverse standard error, or
standard error of the true proportion for a given observed
proportion : equivalence of the direct and inverse standard
errors when n is large—4-8. The importance of errors
other than fluctuations of “simple sampling” in practice :
unrepresentative or biassed sanples—9-10. Effect of diver-
gences from the conditions of simple sampling: (a) effect
of variation in p and gq for the several universes from which
the samples are drawn—11-12. (b) Effect of variation in
p and g from one sub-class to another within each universe—
13-14. (c) Effect of a correlation between the results of the

several events—15. Summary . 276-290

X111
AGL.
        <pb n="18" />
        CONTENTS,
CHAPTER XV.
THE BINOMIAL DISTRIBUTION AND THE
NORMAL CURVE.

1-2. Determination of the frequency-distribution for the number .
of successes in n events: the binomial disiribution—3.
Dependence of the form of the distribution on p, ¢, and n—
4-5. Graphical and mechanical methods of forming re-
presentations of the binomial - distribution—6. Direct
calculation of the mean and the standard-deviation from
the distribution—7-8. Necessity of deducing, for use in
many practical cases, a continuous curve giving approxi-
mately, for large values of n, the terms of the binomial
series—9. Deduction of the normal curve as a limit to the
symmetrical binomial—10-11. The value of the central
ordinate—12. Comparison with a binomial distribution for
a moderate value of n—18. Outline of the more general
conditions from which the curve can be deduced by advanced
methods—14. Fitting the curve to an actual series of
observations—15. Difficulty of a complete test of fit by
elementary methods—16. The table of areas of the normal
curve and its use—17. The quartile deviation and the
¢¢ probable error ’—18. Illustrations of the application of
the normal curve and of the table of areas . . 291-316

CHAPTER XVL
NORMAL CORRELATION.

1-3. Deduction of the general expression for the normal correlation
surface from the case of independence—4. Constancy of the
standard-deviations of parallel arrays and linearity of the
regression—5. The contour lines : a series of concentric and
similar ellipses—6. The normal surface for two correlated
variables regarded as a normal surface for uncorrelated vari-
ables rotated with respect to the axes of measurement:
arrays taken at any angle across the surface are normal
distributions with constant standard-deviation : distribution
of and correlation between linear functions of two normally
correlated variables are normal : principal axes—7. Standard-
deviations round the principal axes—8-11. Investigation of
Table III., Chapter IX., to test normality: linearity of
regression, constancy of standard-deviation of arrays,
normality of distribution obtained by diagonal addition,
contour lines—12-13. Isotropy of the normal distribution
for two variables—14. Outline of the principal properties of
the normal distribution for n variables  . , 317-334

X1V
PAGE
        <pb n="19" />
        CONTENTS.
CHAPTER XVIL
THE SIMPLER CASES OF SAMPLING FOR VARIABLES:
PERCENTILES AND MEAN.
PAGES
1-2. The problem of sampling for variables : the conditions
assumed—3. Standard error of a percentile—4. Special
values for the percentiles of a normal distribution—5.
Effect of the form of the distribution generally—6. Simplified
formula for the case of a grouped frequency-distribution—7.
Correlation between errors in two percentiles of the same
distribution—8. Standard error of the interquartile range
for the normal curve—9. Effect of removing the restrictions
of simple sampling, and limitations of interpretation—10.
Standard error of the arithmetic mean—11. Relative sta-
bility of mean and median in sampling—12. Standard error
of the difference between two means—13. The tendency to
normality of a distribution of means—14. Effect of removing
the restrictions of simple sampling—15. Statement of the
standard errors of standard-deviation, coefficient of variation,
correlation-coefficient, and regression—16, Restatement of
the limitations of interpretation if the sample be small . 3835-356
ApPENDIX I.—Tables for facilitating Statistical Work . , 357-359
ArpENDIX II.—Short List of Works on the Mathematical Theory
of Statistics, and the Theory of Probability . 360-361
SUPPLEMENTS —
I. NorEs SUPPLEMENTARY TO CHAP. VI. . . 362-364
II. Direct DEDUCTION OF THE FOoRMULE FOR REGRESSIONS 365-366
[II. THE LAW oF SMALL CHANCES 366-370
[V. GoopNEss oF Fir 370-389
ADDITIONAL REFERENCES . 390-398
ANSWERS TO, AND HINTS oN THE SOLUTION OF, THE EXERCISES
GIVEN . 399-406
INDEX. . 407-422

XV
        <pb n="20" />
        <pb n="21" />
        THEORY OF STATISTICS.
INTRODUCTION.

1-3. The introduction of the terms *‘ statistics,” statistical,” into the English
language—4-6. The change in meaning of these terms during the
nineteenth century—7-9. The present use of the terms—10. Defini-
tions of ¢ statistics,” ‘¢ statistical methods,” ‘‘ theory of statistics,” in
accordance with present usage.

1. THE words “statist,” “statistics,” ¢ statistical,” appear to be

all derived, more or less indirectly, from the Latin status, in the

sense that it acquired in medieval Latin of a political state.

2. The first term is, however, of much earlier date than the two
others. The word “statist” is found, for instance, in Hamlet
(1602),! Cymbeline (1610 or 1611),2 and in Paradise Regained
(1671).3 The earliest occurrence of the word “statistics” yet
noted is in Zhe Elements of Universal Erudition, by Baron J. F.
von Bielfeld, translated by W. Hooper, M.D. (3 vols., London, 1770).
One of its chapters is entitled Statistics, and contains a definition
of the subject as “The science that teaches us what is the politi-
cal arrangement of all the modern states of the known world.” 4
“Statistics” occurs again with a rather wider definition in the
preface to A Political Survey of the Present State of Europe, by
E. A. W. Zimmermann,’ issued in 1787. “It is about forty
years ago,” says Zimmermann, “that that branch of political
knowledge, which has for its object the actual and relative
power of the several modern states, the power arising from their
natural advantages, the industry and civilisation of their inhabit
ants, and the wisdom of their governments, has been formed, chiefly
by German writers, into a separate science. . . . By the more con-
venient form it has now received . . . . this science, distinguished
by the new-coined name of statistics, is become a favourite study
in Germany” (p. ii) ; and the adjective is also given (p. v), “To
the several articles contained in this work, some respectable

1 Act v., sc. 2. 2 Actii., sc. 4. 3 Bk. iv.

$I cite from Dr W. F. Willcox, Quarterly Publications of the American
Statistical Association, vol. xiv., 1914, p. 287.

5 Zimmermann’s work appears to have been written in English, though he
was a German, Professor of Natural Philosophy at Brunswick.
        <pb n="22" />
        THEORY OF STATISTICS.
statistical writers have added a view of the principal epochas of the
history of each country.”

3. Within the next few years the words were adopted by several
writers, notably by Sir John Sinclair, the editor and organiser of the
first Statistical Account of Scotland, to whom, indeed, their intro-
duction has been frequently ascribed. In the circular letter to the
Clergy of the Church of Scotland issued in May 1790,2 he states
that in Germany * ‘Statistical Inquiries,” as they are called, have
been carried to a very great extent,” and adds an explanatory
footnote to the phrase ‘Statistical Inquiries”—“or inquiries
respecting the population, the political circumstances, the pro-
ductions of a country, and other matters of state.” In the
“ History of the Origin and Progress”? of the work, he tells us,
“Many people were at first surprised at my using the new words,
Statistics and Statistical, as it was supposed that some term in our
own language might have expressed the same meaning. But in
the course of a very extensive tour, through the northern parts of
Europe, which I happened to take in 1786, I found that in
Germany they were engaged in a species of political enquiry,
to which they had given the name of Staéisties;* . ... as I
thought that a new word might attract more public attention,
I resolved on adopting it, and I hope that it is now completely
naturalised and incorporated with our language.” This hope
was certainly justified, but the meaning of the word underwent
rapid development during the half century or so following its
introduction.

4. “Statistics” (statistik), as the term is used by German
writers of the eighteenth century, by Zimmermann and by Sir
John Sinclair, meant simply the exposition of the noteworthy
characteristics of a state, the mode of exposition being—almost
inevitably at that time—preponderantly verbal. The conciseness
and definite character of numerical data were recognised at a
comparatively early period—more particularly by English writers
—Dbut trustworthy figures were scarce. After the commencement
of the nineteenth century, however, the growth of official data
was continuous, and numerical statements, accordingly, began
more and more to displace the verbal descriptions of earlier days.
“ Statistics ” thus insensibly acquired a narrower signification, viz.,

1 Twenty-one vols., 1791-99.

2 Statistical Account, vol. xx., Appendix to ‘‘ The History of the Origin and
Progress . . . .” given at the end of the volume.

8 Loc. cit.,'p. xiii,

4 The Abriss der Statswissenschajft der Europdischen Reiche (1749) of Gottfried
Achenwall, Professor of Politics at Gottingen, is the volume in which the word
¢ statistik ”” appears to be first employed, but the adjective ‘‘statisticus”
occurs at a somewhat earlier date in works written in Latin.

2
        <pb n="23" />
        INTRODUCTION,

the exposition of the characteristics of a State by numerical
methods. It is difficult to say at what epoch the word came
definitely to bear this quantitative meaning, but the transition
appears to have been only half accomplished even after the founda-
tion of the Royal Statistical Society in 1834. The articles in the
first volume of the Journal, issued in 1838-9, are for the most
part of a numerical character, but the official definition has no
reference to method. ¢ Statistics,” we read, “may be said, in the
words of the prospectus of this Society, to be the ascertain-
ing and bringing together of those facts which are calculated to
illustrate the condition and prospects of society.” It is, however,
admitted that “the statist commonly prefers to employ figures
and tabular exhibitions.”

5. Once, however, the first change of meaning was accomplished,
further changes followed. From the name of a science or art of
state-description by numerical methods, the word was transferred to
those series of figures with which it operated, as we speak of vital
statistics, poor-law statistics, and so forth. But similar data
occur in many connections ; in meteorology, for instance, in anthro-
pology, ete. Such collections of numerical data were also termed
“statistics,” and consequently, at the present day, the word is
held to cover a collection of numerical data, analogous to those
which were originally formed for the study of the state, on almost
any subject whatever. We not only read of rainfall “statistics,”
but of “statistics” showing the growth of an organisation for
recording rainfall? We find a chapter headed ‘Statistics ” in a
book on psychology,® and the author, writing of ‘statistics con-
cerning the mental characteristics of man,” “statistics of children,
under the headings bright—average—dull.”* We are informed
that, in a book on Latin verse, the characteristics of the Virgilian
hexameter “are examined carefully with statistics.”

6. The development in meaning of the adjective * statistical”
was naturally similar. The methods applied to the study of
numerical data concerning the state were still termed statistical
methods,” even when appiied to data from other sources. Thus
we read of the inheritance of genius being treated ‘in a statistical
manner,”® and we have now ‘a journal for the statistical
study of biological problems.”” Such phrases as “the statistical

Jour. Stat. Soc., vol. i. p. 1.

? Symons’ British Rainfall for 1899, p. 15.

* E. W. Scripture, The New Psychology, 1897, chap. ii.

4 Op. cit. p. 18.

5 Atheneum, Oct. 3, 1903.

* Francis Galton, Hereditary Genius (Macmillan, 1869), preface.
Biometrika, Cambridge Univ. Press, the first number issued in 1901

2
        <pb n="24" />
        THEORY OF STATISTICS.
investigation of the motion of molecules”! have become part of
the ordinary language of physicists. We find a work entitled
“the principles of statistical mechanics”? and the Bakerian
lecture for 1909, by Sir J. Larmor, was on “the statistical and
thermodynamical relations of radiant energy.”

7. It is unnecessary to multiply such instances to show that the
words statistics,” “statistical,” no longer bear any necessary
reference to “ matters of state.” They are applied indifferently in
physics, biology, anthropology, and meteorology, as well as in the
social sciences. Diverse though these cases are, there must be
some community of character between them, or the same terms
and the same methods would not be applied. What, then, is this
common character ?

8. Let us turn to social science, as the parent of the methods
termed statistical,” for a moment, and consider its characteristics
as compared, say, with physics or chemistry. One characteristic
stands out so markedly that attention has been repeatedly
directed to it by “statistical” writers as the source of the peculiar
difficulties of their science—the observer of social Jacts cannot ex-
pervment, but must deal with circumstances as they occur, apart
Jrom his control. Now the object of experiment is to replace the
complex systems of causation usually occurring in nature by
simple systems in which only one causal circumstance is permitted
to vary at a time. This simplification being impossible, the
observer has, in general, to deal with highly complicated cases of
multiple causation—cases in which a given result may be due to
any one of a number of alternative causes or to a number of
different causes acting conjointly.

9. A little consideration will show, however, that this is also
precisely the characteristic of the observations in other fields to
which statistical methods are applied. The meteorologist, for
example, is in almost precisely the same position as the student
of social science. He can experiment on minor points, but the
records of the barometer, thermometer, and rain gauge have to be
treated as they stand. With the biologist, matters are in some-
what better case. He can and does apply experimental methods
to a very large extent, but frequently cannot approximate
closely to the experimental ideal ; the internal circumstances of
animals and plants too easily evade complete control. Hence a
large field (notably the study of variation and heredity) is left,
in which statistical methods have either to aid or to replace the
methods of experiment. The physicist and chemist, finally,

1 Clerk Maxwell, “Theory of Heat” (1871), and ‘‘On Boltzmann’s
Theorem ” (1878), Camb. Phil. Trans., vol. xii.
2 By J. Willard Gibbs (Macmillan, 1902),

4
        <pb n="25" />
        INTRODUCTION.

stand at the other extremity of the scale. Theirs are the

sciences in which experiment has been brought to its greatest

perfection. But even so, statistical methods still find application.

In the first place, the methods available for eliminating the effect

of disturbing circumstances, though continually improved, are not,

and cannot be, absolutely perfect. The observer himself, as well
as the observing instrument, is a source of error; the effects of
changes of temperature, or of moisture, of pressure, draughts, vibra-
tion, cannot be completely eliminated. Further, in the problems

of molecular physics, referred to in the last sentences of § 6,

multiplicity of causes is of the essence of the case. The motion

of an atom or of a molecule in the middle of a swarm is dependent
on that of every other atom or molecule in the swarm.

10. In the light of this discussion, we may accordingly give the
following definitions :—

By statistics we mean quantitative data affected to a marked
extent by a multiplicity of causes.

By statistical methods we mean methods specially adapted to
the elucidation of quantitative data affected by a multiplicity of
causes.
| By theory of statistics we mean the exposition of statistical
methods.

The insertion in the first definition of some such words as “to

a marked extent ” is necessary, since the term ‘statistics ” is not

usually applied to data, like those of the physicist, which are

affected only by a relatively small residuum of disturbing causes.

At the same time, ‘statistical methods” are applicable to all such

cases, whether the influence of many causes be large or not.

REFERENCES.
The History of the Words “Statistics,” “ Statistical”

(1) Joun, V., Der Name Statistik ; Weiss, Berne, 1883. A translation in
Jour. Roy. Stat. Soc. for same year.

(2) YuLg, G. U., ““ The Introduction of the Words ‘Statistics,’ ¢ Statistical,’
inte $us English Language,” Jour. Roy. Stat. Soc., vol. lxviii., 1905,
p. 391.

The History of Statistics in General.

(3) Jomx, V., Geschichte der Statistik, 1%¢ Teil, bis auf Quetelet; Enke,
Stuttgart, 1884. (All published ; the author died in 1900. By far the
best history of statistics down to the early years of the nineteenth
century.)

(4) MonL, ROBERT VON, Geschichte und Litteratur der Staatswissenschaflen,
3 vols. ; Enke, Erlangen, 1855-58. (For history of statistics see
principally latter half of vol. iii.)

A
        <pb n="26" />
        THEORY OF STATISTICS.

(5) GABAGLIO, ANTONIO, Teoria generale della statistica, 2 vols.; Hoepli,

Milano, 2nd edn., 1888. (Vol. i, Parte storica.)
Several works on theory of statistics include short histories, e.g.

H. Westergaard’s Die Grundziige der Theorie der Statistik (Fischer,
Jena, 1890), and P. A. Meitzen’s Geschichte, Theorie und Technik der
Statistik (new edn., 1903 ; American translation by R. P. Falkner,
1891). There is no detailed history in English, but the article
‘“ Statistics” in the Encyclopeedio Britannica (11th edn.) gives a very
slight sketch, and the biographical articles in Palgrave’s Dictionary of
Political Economy are useful. For its importance as regards the English
school of political arithmetic, reference may also be made to—

(6) HuLy, OC. H., The Economic Writings of Sir William Petty, together
with the Observations on the Bills of Mortality more probably by Captain
John Qraunt, Cambridge University Press, 2 vols., 1899.

History of Theory of Statistics.
Somewhat slight information is given in the general works cited.

From the purely mathematical side the following is important: —

(7) ToopHUNTER, I., 4 History of the Mathematical Theory of Probability
from the time of Pascal to that of Laplace ; Macmillan, 1865.

History of Official Statistics.

(8) BERTILLON, J., Cours élémentaire de statistique; Société d’éditions
scientifiques, 1895. (Gives an exceedingly useful outline of the history
of official statistics in different countries.)

6
        <pb n="27" />
        =
0
)
PART IL.—THE THEORY OF ATTRIBUTES.
CHAPTER L
NOTATION AND TERMINOLOGY.

1-2. Statistics of attributes and statistics of variables : fundamental character
of the former—3-5. Classification by dichotomy—6-7. Notation for
single attributes and for combinations—8. The class-frequency—9.
Positive and negative attributes, contraries—10. The order of a class—
11. The aggregate—12. The arrangement of classes by order and
aggregate—13-14. Sufficiency of the tabulation of the ultimate class-
frequencies—15-17. Or, better, of the positive class-frequencies—18.
The class-frequencies chosen in the census for tabulation of statistics
of infirmities—19. Inclusive and exclusive notationsand terminologies.

1. THE methods of statistics, as defined in the Introduction,

deal with quantitative data alone. The quantitative character

may, however, arise in two different ways.

In the first place, the observer may note only the presence or
absence of some attribute in a series of objects or individuals, and
count how many do or do not possess it. Thus, in a given
population, we may count the number of the blind and seeing,
the dumb and speaking, or theinsane and sane. The quantitative
character, in such cases, arises solely in the counting.

In the second place, the observer may note or measure the
actual magnitude of some variable character for each of the
objects or individuals observed. He may record, for instance, the
ages of persons at death, the prices of different samples of a
commodity, the statures of men, the numbers of petals in flowers.
The observations in these cases are quantitative ab initio.

2. The methods applicable to the former kind of observations,
which may be termed statistics of attributes, are also applicable
to the latter, or statistics of variables. A record of statures of
men, for example, may be treated by simply counting all measure-
ments as fall that exceed a certain limit, neglecting the magnitude
of excess or defect, and stating the numbers of tall and skort (or
        <pb n="28" />
        THEORY OF STATISTICS.

more strictly not-tall) on the basis of this classification. Similarly,
the methods that are specially adapted to the treatment of
statistics of variables, making use of each value recorded, are
available to a greater extent than might at first sight seem possible
for dealing with statistics of attributes. For example, we may
treat the presence or absence of the attribute as correspending to
the changes of a variable which can only possess two values, say
0 and 1. Or, we may assume that we have really to do with a
variable character which has been crudely classified, as suggested
above, and we may be able, by auxiliary hypotheses as to the
nature of this variable, to draw further conclusions. But the
methods and principles developed for the case in which the observer
only notes the presence or absence of attributes are the simplest
and most fundamental, and are best considered first. This and
the next three chapters (Chapters I.-IV.) are accordingly devoted
to the Theory of Attributes.

3. The objects or individuals that possess the attribute, and
those that do not possess it, may be said to be members of two
distinct classes, the observer classifying the objects or individuals
observed. In the simplest case, where attention is paid to one
attribute alone, only two mutually exclusive classes are formed.
If several attributes are noted, the process of classification may,
however, be continued indefinitely. Those that do and do not
possess the first attribute may be reclassified according as they do
or do not possess the second, the members of each of the sub-
classes so formed according as they do or do not possess the
third, and so on, every class being divided into two at each step.
Thus the members of the population of any district may be
classified into males and females; the members of each sex into
sane and insane; the insane males, sane males, insane females,
and sane females into blind and seeing. If we were dealing with
a number of peas (Pisum sativum) of different varieties, they
might be classified as tall or dwarf, with green seeds or yellow
seeds, with wrinkled seeds or round seeds, so that we would have
eight classes—tall with round green seeds, tall with round yellow
seeds, tall with wrinkled green seeds, tall with wrinkled yellow
seeds, and four similar classes of dwarf plants.

4. It may be noticed that the fact of classification does not
necessarily imply the existence of either a natural or a clearly
defined boundary between the two classes. The boundary may
be wholly arbitrary, e.g. where prices are classified as above or
below some special value, barometer readings as above or below
some particular height. The division may also be vague and
uncertain : sanity and insanity, sight and blindness, pass
into each other by such fine gradations that judgments may

R
        <pb n="29" />
        I.—NOTATION AND TERMINOLOGY. J
differ as to the class in which a given individual should be
entered. The possibility of uncertainties of this kind should
always be borne in mind in considering statistics of attributes:
whatever the nature of the classification, however, natural or
artificial, definite or uncertain, the final judgment must be de-
cisive ; any one object or individual must be held either to possess
the given attribute or not.

5. A classification of the simple kind considered, in which each
class is divided into two sub-classes and no more, has been termed
by logicians classification, or, to use the more strictly applicable
term, division by dichotomy (cutting in two). The classifica
tions of most statistics are not dichotomous, for most usually a
class is divided into more than two sub-classes, but dichotomy is
the fundamental case. In Chapter V. the relation of dichotomy
to more elaborate (manifold, instead of twofold or dichotomous)
processes of classification, and the methods applicable to some
such cases, are dealt with briefly.

6. For theoretical purposes it is necessary to have some simple
notation for the classes formed, and for the numbers of observa-
tions assigned to each,

The capitals 4, B, C, . . . will be used to denote the several
attributes. An object or individual possessing the attribute 4
will be termed simply 4. The class, all the members of which
possess the attribute 4, will be termed the class 4. It is con-
venient to use single symbols also to denote the absence of the
attributes 4, B, C, . . . We shall employ the Greek letters, a,
By v» -.. Thus if A represents the attribute blindness, a
represents sight, i.e. non-blindness; if B stands for deafness, 8
stands for kearing. Generally “a” is equivalent to “non-A,” or
an cbject or individual not possessing the attribute A ; the class a
is equivalent to the class none of the members of which possess the
attribute A.

7. Combinations of attributes will be represented by juxta-
positions of letters. Thus if, as above, 4 represents blindness, B
deafness, AB represents the combination blindness and deafness.
If the presence and absence of these attributes be noted, the four
classes so formed, viz. 4B, 4f3, aB, af3, include respectively the
blind and deaf, the blind but not-deaf, the deaf but not-blind, and
the neither blind nor deaf. If a third attribute be noted, e.g. in-
sanity, denoted say by C, the class ABC, includes those who are
at once deaf, blind, and insane, 4 By those who are deaf and blind
but not vnsane, and so on.

Any letter or combination of letters like 4, AB, aB, ABy, by
means of which we specify the characters of the members of a class,
may be termed a class symbol.

U
        <pb n="30" />
        THEORY OF STATISTICS.

8. The number of observations assigned to any class is termed,
for brevity, the frequency of the class, or the class-frequency.
Class-frequencies will be denoted by enclosing the corresponding
class-symbols in brackets. Thus—

(4) denotes number of A's, 1.6. objects possessing attribute 4

(a) 7 ™ a’s, 2: DOG, on A

(4B), ' AB's, ,» possessing attributes 4 and B

(«B) 5 ” abB’s, ” ” »  Bbutnot 4

(4BC) ,, 3 ABC's, 2 © ” A, B, and C

(aBC) 2 aB(C’s, 2 &gt;: B and C but not 4
(BC) » aBC’s, ’ 2 » C but neither 4 nor B
and so on for any number of attributes. If 4 represent, as in
the illustration above, blindness, B deafness, C insanity, the
symbols given stand for the numbers of the blind, the not-blind,
the blind and deaf, the deaf but not blind, the blind, deaf, and vn-
sane, the deaf and insane but mot blind, and the insane but neither
blind nor deaf, respectively.

9. The attributes denoted by capitals ABC, . .. may be
termed positive attributes, and their contraries denoted by Greek
letters negative attributes. If a class-symbol include only
capital letters, the class may be termed a positive class; if only
Greek letters, a negative class. Thus the classes 4, 4B, ABC
are positive classes ; the classes a, af, ay, negative classes.

If two classes are such that every attribute in the symbol for
the one is the negative or contrary of the corresponding attribute
in the symbol for the other, they may be termed contrary classes
and their frequencies contrary frequencies ; e.g. 4B and of3, 4/8
and aB, 43C and aBy, are pairs of contraries.

10. The classes obtained by noting say = attributes fall into
natural groups according to the numbers of attributes used to
specify the respective classes, and these natural groups should be
borne in mind in tabulating the class-frequencies. A class
specified by r attributes may be spoken of as a class of the rth
order and its frequency as a frequency of the th order. Thus 45,
AC, BC are classes of the second order; (4), (48), (aBC),
(4ByD), class-frequencies of the first, second, third, and fourth
orders respectively.

11. The classes of one and the same order fall into further
groups according to the actual attributes specified. Thus if three
attributes 4, B, C' have been noted, the classes of the second order
may be specified by any one of the pairs of attributes 4B, AC, or
BC (and their contraries). The series of classes or class-frequen-
cies given by any one positive class and the classes whose symbols
are derived therefrom by substituting Greek letters for one or
more of the italic capital letters in every possible way will be
termed an aggregate. Thus (4B) (43) (aB) (a3) form an aggre-

10
        <pb n="31" />
        L—NOTATION AND TERMINOLOGY. 11
gate of frequencies of the second order, and the twelve classes of
the second order which can be formed where three attributes
have been noted may be grouped into three such aggregates.

12. Class-frequencies £ ay in tabulating, be arranged so that
frequencies of the same order and frequencies belonging to the
same aggregate are kept together. Thus the frequencies for the
case of three attributes should be grouped as given below ; the
whole number of observations denoted by the letter I being
reckoned as a frequency of order zero, since no attributes are
specified :—

Order 0. WN
Order 1. (4) (B) (7
(a) (B) i
Order 2. (4B) (40) «
4p) (dy)
(aB) (aC) | . ()
(a3) (a7) (,
Order 3. (ABC) (a BC)
(4By) (aBy)
(ABC) (afC)
(487) (apy)

13. In such a complete table for the case of three attributes,
twenty-seven distinct frequencies are given :—1 of order zero, 6
of the first order, 12 of the second, and 8 of the third. It
is, however, in no case necessary to give such a complete
statement.

The whole number of observations must clearly be equal to the
number of 4’s together with the number of a’s, the number of
4’s to the number of 4’s that are B together with the number of
4’s that are not B ; and so on,—i.e. any class-frequency can always
be expressed vn terms of class-frequencies of higher order. Thus—

N=(4)+(a)=(B)+(B)=ete.
= (LR) + (4B) + (aB) + (af3) = ete. @)
(4)= (4B) + (48) = (40) + (47) =eto. |
(4B) = (4BC) + (4 By) = ete. )

Hence, instead of enumerating all the frequencies as under (1),
no more need be given, for the case of three attributes, than
the eight frequencies of the third order. If four attributes had
been noted it would be sufficient to give the sixteen frequencies of
the fourth order.

The classes specified by all the attributes noted in any case,
t.e. classes of the nth order in the case of n attributes, may be
        <pb n="32" />
        ’ THEORY OF STATISTICS.

termed the ultimate classes and their frequencies the ultimate
frequencies. Hence we may say that #t is never necessary to
enumerate more than the ultimate frequencies. All the others can
be obtained from these by simple addition.

Example i.—(See reference 5 at the end of the chapter.)
A number of school children were examined for the presence
or absence of certain defects of which three chief descriptions
were noted, 4 development defects, B nerve signs, C low
nutrition.

Given the following ultimate frequencies, find the frequencies
of the positive classes, including the whole number of obser-
vations JV.

(480) 57 (aBC) 78
(4 By) 281 (aBy) 670
(480) 86 (aB0) 65
(48) 453 (By) 8310

The whole number of observations XN is equal to the grand
total :  =10,000.

The frequency of any first-order class, e.g. (4) is given by the
total of the four third-order frequencies, the class-symbols for
which contain the same letter—

(4BC) + (4 By) + (ABC) + (4ABy)= (4) = 871.

Similarly, the frequency of any second-order class, e.g. (4B), is
given by the total of the two third-order frequencies, the class-
symbols for which both contain the same pair of letters—

(ABC) + (4 By) = (4B) = 338.
The complete results are—
N 10,000 AB) 338
(4) 877 40) 143
&amp; 1,086 i 135
0) 286 ABC) 57

14. The number of ultimate frequencies in the general case of
n attributes, or the number of classes in an aggregate of the nth
order, is given by considering that each letter of the class-symbol
may be written in two ways (4 or a, B or 3, C' or vy), and that
either way of writing one letter may be combined with either
way of writing another. Hence the whole number of ways in
which the class-symbol may be written, z.e. the number of
classes, is—

AE SH EC le

2
-
        <pb n="33" />
        L—NOTATION AND TERMINOLOGY. 13

The ultimate frequencies form one natural set in terms of which
the data are completely given, but any other set containing the
same number of algebraically independent frequencies, viz. 27
may be chosen instead.

15. The positive class-frequencies, including under this head the
total number of observations &amp;, form one such set. They are alge-
braically independent ; no one positive class-frequency can be ex-
pressed wholly in terms of the others. Their number is, moreover,
2", as may be readily seen from the fact that if the Greek letters
are struck out of the symbols for the ultimate classes, they become
the symbols for the positive classes, with the exception of afy
. +. . for which # must be substituted. Otherwise the number
is made up as follows :—

Order 0. (The whole number of observations) . : 1
Order 1. (The number of attributes noted) . : n
Order 2. (The number of combinations of n things 2 together) ph
Order 3. (The number of combinations of n things 3 together) aol) Fed
and so on. But the series
n(n—-1) n(n-1)(n-2)

l+n+ 1.9 {55 Th eieite
is the binomial expansion of (141) or 2", therefore the total
number of positive classes is 2".

16. The set of positive class-frequencies is a most convenient
one for both theoretical and practical purposes.

Compare, for instance, the two forms of statement, in terms of
the ultimate and the positive classes respectively, as given in
Example i,, § 13. The latter gives directly the whole number of
observations and the totals of 4’s, B’s, and (’s. The former gives
none of these fundamentally important figures without the perfor-
mance of more or less lengthy additions. Further, the latter gives
the second-order frequencies (4B), (4C), and (BC), which are neces-
sary for discussing the relations subsisting between 4, B, and C, but
are only indirectly given by the frequencies of the ultimate classes.

17. The expression of any class-frequency in terms of the
positive frequencies is most easily obtained by a process of step-
by-step substitution ; thus—

(@B) =(a)- (aB)

=N-(4)-(B)+(4B) . ?)
(afy) = (ap) - (aBC)

=N - “4 - (B) + (4B) - (aC) + (a BC)

=X (4) -(B)-(C) + (4B) + (AC) + (BC) = (4BC) (4)

(c
        <pb n="34" />
        te THEORY OF STATISTICS.

Arithmetical work, however, should be executed from first
principles, and not by quoting formule like the above.

Example ii.—Check the work of Example i., § 13, by finding the
frequencies of the ultimate classes from the frequencies of the
positive classes.

i = (4B) - (4BC)=338 —57=281
Ay) = (dy) - (ABy)= (4) - (40) ~ (4By)
=877 - 143 — 281 =453
(aBy) = (By) - (4By) =H = (B) - (C) + (BO) - (43)
=10,000 — 1086 — 286 +135 — 453
=10,135 — 1825 = 8310
and so on.

18. Examples of statistics of precisely the kind now under
consideration are afforded by the census returns, e.g., of 1891 or
1901, for England and Wales, of persons suffering from different
“infirmities,” any individual who is deaf and dumb, blind or
mentally deranged (lunatic, imbecile, or idiot) being required to
be returned as such on the schedule. The classes chosen for
tabulation are, however, neither the positive nor the ultimate
classes, but the following (neglecting minor distinctions amongst
the mentally deranged and the returns of persons who are deaf
but not dumb) :—Dumb, blind, mentally deranged ; dumb and
blind but not deranged; dumb and deranged but not blind;
blind and deranged but not dumb ; blind, dumb, and deranged.
If, in the symbolic notation, deaf-mutism be denoted by 4, blind-
ness by B, and mental derangement by C, the class-frequencies
thus given are (4), (B), (C), (4By), (480), (aBC), (ABC) (cf.
Census of England and Wales, 1891, vol. iii., tables 15 and 16,
p. vii. Census of 1901, Summary Tables, table xlix.). This set of
frequencies does not appear to possess any special advantages.

19. The symbols of our notation are, it should be remarked,
used in an inclusive sense, the symbol 4, for example, signifying
an object or individual possessing the attribute 4 with or without
others. This seems to be the only natural use of the symbol,
but at least one notation has been constructed on an exclusive
basis (cf. ref. b), the symbol 4 denoting that the object or in-
dividual possesses the attribute 4, but not B or C or D, or what-
ever other attributes have been noted. An exclusive notation is
apt to be relatively cumbrous and also ambiguous, for the reader
cannot know what attributes a given symbol excludes until he
has seen the whole list of attributes of which note has been
taken, and this list he must bear in mind. The statement that
the symbol A is used exclusively cannot mean, obviously, that the
object referred to possesses only the attribute 4 and no others

yd
        <pb n="35" />
        I.—NOTATION AND TERMINOLOGY. 5
whatever ; it merely excludes the other attributes noted in the
particular investigation. Adjectives, as well as the symbols which
may represent them, are naturally used in an inclusive sense, and
care should therefore be taken, when classes are verbally described,
that the description is complete, and states what, if anything, is
excluded as well as what is included, in the same way as our
notation. The terminology of the English census has not, in
this respect, been quite clear. The “Blind” includes those who
are ‘Blind and Dumb,” or “ Blind, Dumb, and Lunatic,” and so
forth. But the heading “Blind and Dumb,” in the table relating
to “combined infirmities,” is used in the sense “Blind and Dumb,
but not Lunatic or Imbecile,” etc., and so on for the others. In
the first table the headings are inclusive, in the second exclusive.

REFERENCES.

(1) Jevoxs, W, STANLEY, *‘On a General System of Numerically Definite
Reasoning,” Memoirs of the Manchester Lit. and Phil. Soc., 1870.
Reprinted in Pure Logic and other Minor Works; Macmillan, 1890.
(The method used in these chapters is that of Jevons, with the notation
slightly modified to that employed in the next three memoirs cited.)

(2) Yur, G. U., “On the Association of Attributes in Statistics, ete.,” Phil.
Trans. Roy. Soc., Series A, vol. cxciv., 1900, p. 257.

(3) YULE, G. U., “On the Theory of Consistence of Logical Class-frequencies
and its Geometrical Representation,” Phil. Trans. Roy. Soc., Series A,
vol. exevii., 1901, p. 91.

(4) Yur, G. U., “Notes on the Theory of Association of Attributes in
Statistics,” Biometrika, vol. ii., 1903, p. 121. (The first three sections
of (4) are an abstract of (2) and (3). The remarks made as regards the
tabulation of class-frequencies at the end of (2) should be read in con-
nection with the remarks made at the beginning of (3) and in this
chapter : cf. footnote on p. 94 of (3).

Material has been cited from, and reference made to the notation used in—

(5) WARNER, F., and others, ‘ Report on the Scientific Study of the Mental and
Physical Conditions of Childhood” ; published by the Committee,
Parkes Museum, 1895.

(6) WARNER, F., “Mental and Physical Conditions among Fifty Thousand
Children, ete.,” Jour. Roy. Stat. Soc., vol. lix., 1896, p. 125.

EXERCISES.

1. (Figures from ref. (5).) The following are the numbers of boys observed
with certain classes of defects amongst a number of school-children. 4,
denotes development defects ; B, nerve signs; C, low nutrition.

(4B0) 149 (a BO) 204

(4ABy) 738 (aBy) 1,762

(480) 225 (aBC) 171

(4By) 1,196 (aBy) 21,842
Find the frequencies of the positive classes.

1:
        <pb n="36" />
        THEORY OF STATISTICS.
2. (Figures from ref. (5).) The following are the frequencies of the
positive classes for the girls in the same investigation :—
N 23,713 (4B) 587
(4) 1,618 (40) 428
(B) 2,015 (BC) 335
(0) 770 (ABC) 156
Find the frequencies of the ultimate classes.

3. (Figures from Census, England and Wales, 1891, vol. iii.) Convert the
census statement as below into a statement in terms of (@) the positive, (b)
the ultimate class-frequencies. 4 =blindness, B=deaf-mutism, C'=mental
derangement.

N 29,002,525 (4 By) 82
(4) 23,467 (4B0) 380
(B) 14,192 (aBC) 500
(0) 97,383 (4B0C) 25

4. (Of. Mill’s Logic, bk. iii, ch. xvii.,, and ref. (1).) Show that if 4
occurs in a larger proportion of the cases where B is than where 5 is not,
then will B occur in a larger proportion of the cases where 4 is than where
4 is not: i.e. given (4B)/(B)&gt;(4B)/(B), show that (4 B)/(4)&gt;(aB)/(a).

5. (Cf. De Morgan, Formal Logic, p. 163, and ref. (1).) Most B's are 4’s,
most B’s are (Ps: find the least number of A4’s that are C’s, 7.e. the lowest
possible value of (40).

6. Given that

(4)=(a)=(B)=(8)=3N,
show that
(4B)=(aB), (48) =(aB).
7. (Cf. ref. (2), § 9, ¢“ Case of equality of contraries.”) Given that
(A)=(a)=(B)=(B)=(C)=(y)=4D,
and also that
(4BC)=(aBy),
show that
2 (4BC)=(4B)+(4C)+(BC)-3N.

8. Measurements are made on a thousand husbands and a thousand wives.
If the measurements of the husbands exceed the measurements of the wives in
800 cases for one measurement, in 700 cases for another, and in 660 cases for
both measurements, in how many cases will both measurements on the wife
exceed the measurements on the husband ?

16
        <pb n="37" />
        CHAPTER IL
CONSISTENCE.

1-3. The field of observation or universe and its specification by symbols—
4. Derivation of complex from simple relations by specifying the
universe—&gt;5-6. Consistence—7-10. Conditions of consistence for one
and for two attributes—11-14. Conditions of consistence for three
attributes.

I. Any statistical inquiry is necessarily confined to a certain

time, space, or material. An investigation on the prevalence of

insanity, for instance, may be limited to England, to England in

1901, to English males in 1901, or even to English males over 60

years of age in 1901, and so on.

For actual work on any given subject, no term is required to
denote the material to which the work is so confined: the limits
are specified, and that is sufficient. But for theoretical purposes
some term is almost essential to avoid circumlocution. The ex-
pression the universe of discourse, or simply the universe, used
in this sense by writers on logic, may be adopted as familiar and
convenient.

2. The universe, like any class, may be considered as specified
by an enumeration of the attributes common to all its members,
e.g. to take the illustration of § 1, those implied by the predicates
English, male, over 60 years of age, living tn 1901. It is not, in
general, necessary to introduce a special letter into the class-
symbols to denote the attributes common to all members of the
universe. We know that such attributes must exist, and the
common symbol can be understood.

In strictness, however, the symbol ought to be written : if, say,
U denote the combination of attributes, English—male—over 60
—Iliving in 1901, 4 insanity, B blindness, we should strictly use
the symbols—

(U) =Number of English males over 60 living in 1901,

(U4) = 23 insane English males over 60 living in 1901,

(UB) = PP) blind ” » »

(U4B)= 5 blind and insane English males over 60 living in 1901,
2

17
        <pb n="38" />
        THEQRY OF STATISTICS.
instead of the simpler symbols &amp; (4) (B) (4B). Similarly, the
general relations (2), § 13, Chap. 1, using U to denote the common
attributes of all the members of the universe and (I) conscquently
the total number of observations 4, should in strictness be written
in the form—
(U) =(UA)+(Ua)=(UB)+ (UB) =c¢te.
= (UAB) + (UAB) + (UaB) + (Uap) = ete.

UA) =(UAB)+(UAB)= (UAC) + (Udy)=ete.

UAB) =(UABC) + (UABy) = ete.

3. Clearly, however, we might have used any other symbol
instead of U to denote the attributes common to all the members
of the universe, e.g. 4 or B or AB or ABC, writing in the latter
case—

(ABC) = (ABCD) + (4BCY)
and so on. Hence any attribute or combination of attributes
common to all the class-symbols in an equation may be regarded as
specifying the universe within which the equation holds good.
Thus the equation just written may be read in words: The
number of objects or individuals in the universe ABC is equal to
the number of D’s together with the number of not-D’s within
the same universe.” The equation
(AC) =(4BC) + (480)

may be read : ‘The number of 4’s is equal to the number of 4’s
that are B together with the number of 4’s that are not-B
within the universe C.”

4. The more complex may be derived from the simpler relations
between class-frequencies very readily by the process of specifying
the universe. Thus starting from the simple equation

(a) == (4),
we have, by specifying the universe as (3,
(B)= (8) - (48)
=N-(4)-(B)+ (4D).
Specifying the universe, again, as y, we have
(aBy) = (7) - (Ay) = (By) + (4.By)
=N-(4)-(B)—(C)+ (4D) + (4C) + (BC) - (4BC0).

5. Any class-frequencies which have been or might have been

observed within one and the same universe may be said to be

18
        <pb n="39" />
        IL.—CONSISTENCE. J
consistent with one another. They conform with one another,
and do not in any way conflict.

The conditions of consistence are some of them simple, but
others are by no means of an intuitive character. Suppose, for
instance, the data are given—

: 1000 (4D) 42
(4) 525 (AC) 147
(5) 312 (BO) 86
(C) 470 (4BC) 25
—there is nothing obviously wrong with the figures. Yet they
are certainly inconsistent. They might have been observed at
different times, in different places or on different material, but
they cannot have been observed in one and the same universe.
They imply, in fact, a negative value for (afy)—
(aBy)=1000 — 525 — 312 — 470 +42 + 147 + 86 — 25.
=1009 - 1307 + 275 - 25.
= — 57.

Clearly no class-frequency can be negative. If the figures,
consequently, are alleged to be the result of an actual inquiry in
a definite universe, there must have been some miscount or
misprint.

6. Generally, then, we may say that any given class-frequencies
are inconsistent if they imply negative values for any of the
unstated frequencies. Otherwise they are consistent. To test the
consistence of any set of 2" algebraically independent frequencies,
for the case of = attributes, we should accordingly calculate
the values of all the unstated frequencies, and so verify the fact
that they are positive. This procedure may, however, be limited
by a simple consideration. If the ultimate class-frequencies are
positive, all others must be so, being derived from the ultimate
frequencies by simple addition. Hence we need only calculate
the values of the ultimate class-frequencies in terms of those
given, and verify the fact that they exceed zero.

7. As we saw in the last chapter, there arc two sets of 2°
algebraically independent frequencies of practical importance, viz.
(1) the ultimate, (2) the positive class-frequencies.

It follows from what we have just said that there is only one
condition of consistence for the ultimate frequencies, viz. that
they must all exceed zero. Apart from this, any one frequency of
the set may vary anywhere between 0 and co without becoming
inconsistent with the others.

For the positive class-frequencies, the conditions may be

1¢
        <pb n="40" />
        THEORY OF STATISTICS.
expressed symbolically by expanding the ultimate in terms of
the positive frequencies, and writing each such expansion not
less than zero. We will consider the cases of one, two, and
three attributes in turn.

8. If only one attribute be noted, say 4, the positive frequencies
are V and (4). The ultimate frequencies are (4) and (a), where

(a) = NN = (4).
The conditions of consistence are therefore simply
M40  N-(4)40
or, more conveniently expressed,
(@ (A)&lt;0 (5) (A): AE. (1)

These conditions are obvious: the number of 4’s cannot be less
than zero, nor exceed the whole number of observations.

9. If two attributes be noted there are four ultimate frequencies
(4B), (4B), (aB), (eB). The following conditions are given by
expanding each in terms of the frequencies of positive classes—

(a) (4B)&lt;0 or (45) would be negative

(6) (AB) (4)+(B)-N ,, (af) ” ” (2)

(c) (AB)}(4) » (45) ” ” (

(d) (4B)3(B) » (eB) ” )
(a), (c), and (d) are obvious; (b) is perhaps a little less obvious,
and is occasionally forgotten. It is, however, of precisely the
same type as the other three. None of these conditions are
really of a new form, but may be derived at once from (1) (a) and
(1) (6) by specifying the universe as B or as f respectively. The
conditions (2) are therefore really covered by (1).

10. But a further point arises as regards such a system of
limits as is given by (2). The conditions (a) and (b) give lower or
minor limits to the value of (4B); (¢) and (d) give upper or
major limits. If either major limit be less than either minor limit
the conditions are impossible, and it is necessary to see whether
(4) and (B) can take such values that this may be the case.

Expressing the condition that the major limits must be not less
than the minor, we have—

(4)40 { (B)&lt;0 }

4)» (B)»N
These are simply the conditions of the form (1). If, therefore,
(4) and (B) fulfil the conditions (1), the conditions (2) must be

20
        <pb n="41" />
        IL.—CONSISTENCE. 21
possible. The conditions (1) and (2) therefore give all the con-
ditions of consistence for the case of two attributes, conditions of
an extremely simple and obvious kind.

11. Now consider the case of three attributes. There are
eight ultimate frequencies. Expanding the ultimate in terms of
the positive frequencies, and expressing the condition that each
expansion is not less than zero, we have—

or the frequency given below
will Le negative.
(a) (4BC)&lt;0 4B0C))
L(4R) + (40) - (4) (4/3)
FHS 2
(In) + = a
(45) (4.3) | 4)
C30 (480)
a) FO) (aBC)
B)  P(AB)+(4C)+ (BC) - (4) - (B)- (C) +N (afy)

These, again, are not conditions of a new form. We leave it
as an exercise for the student to show that they may be derived
from (1) (a) and (1) (4) by specifying the universe in turn as
BC, By, 3C, and By. The two conditions holding in four universes
give the eight inequalities above.

12. As in the last case, however, these conditions will be im-
possible to fulfil if any one of the major limits (¢)—(%) be less than
any one of the minor limits (a)-(d). The values on the right
must be such as to make no major limit less than a minor.

There are four major and four minor limits, or sixteen compari-
sons in all to be made. But twelve of these, the student will
find, only lead back to conditions of the form (2) for (4B), (40),
and (BC) respectively. The four comparisons of expansions due
to contrary frequencies ( (a) and (&amp;), (6) and (g), (¢) and (f), (d)
and (e) ) alone lead to new conditions, viz.—

(a) (4B) + gio +(BC) 4(4) +(B) +(C) - N)

(6) (4B)+(-.)=(LC)}(4) 4)
(e) (AB)—(A0)+(LC)3(B)

(d) - (4B) + (40) + (BC) » (C)

13. These are conditions of a wholly new type, not derivable
in any way from those given under (1) and (2). They are con-
ditions for the consistence of the second-order frequencies with
each other, whilst the inequalities of the form (2) are only conditions
for the consistence of the second-order frequencies with those of
lower orders. Given any two of the second-order frequencies, e.g.

Q.
        <pb n="42" />
        THEORY OF STATISTICS.

(AB) and (4C), the conditions (4) give limits for the third, viz.
(BC). They thus replace, for statistical purposes, the ordinary
rules of syllogistic inference. From data of the syllogistic form,
they would, of course, lead to the same conclusion, though in a
somewhat cumbrous fashion; one or two cases are suggested as
exercises for the student (Questions 6 and 7). The following
will serve as illustrations of the statistical uses of the con-
ditions :—

Example i.—Given that (4)=(B)=(C)=1N and 80 per cent.
of the 4’s are B, 75 per cent. of 4’s are C, find the limits to the
percentage of B’s that are ¢'. The data are—

2048) 2040)
op 0-8 So = 0-75
and the conditions give—
HPL) e) 08 0
(%) $0'8+075-1
(c) 31 -08 +075
(d) +1 +08 -075
(a) gives a negative limit and (d) a limit greater than unity;
hence they may be disregarded. From (6) and (¢) we have—
280) AEC), ,.
¥ 0°55 7 +095
—that is to say, not less than 55 per cent. nor more than 95 per
cent. of the B’s can be C.

Erample ii.—If a report give the following frequencies as
actually observed, show that there must be a misprint or mistake
of some sort, and that possibly the misprint consists in the
dropping of a 1 before the 85 given as the frequency (BC).

&amp; 1000

(4) 510 (4B) 189

(B) 490 (40) 140

(®) 427 (BC) 85
From (4) (a) we have—

(BC) &lt;510+490 +427 — 1000 — 189 — 140
&lt; 98.

But 85 &lt; 98, therefore it cannot be the correct value of (BC).
If we read 185 for 85 all the conditions are fulfilled.

29
        <pb n="43" />
        II.—CONSISTENCE. 22

Example iii.,—In a certain set of 1000 observations (4)=45,
(B)=23, (C)=14. Show that whatever the percentages of B’s
that are 4 and of (’s that are 4, it cannot be inferred that any B’s
are C.

The conditions (a) and (%) give the lower limit of (BC), which
is required. We find—

(BC), (4B) _(40) _.

(a) 7 &lt; WV V 918.
(BO), (4B), (40) _.

(5) 7 &lt; ry 045.

The first limit is clearly negative. The second must also be
negative, since (4B8)/N cannot exceed ‘023 nor (4C)/N -014.
Hence we cannot conclude that there is any limit to (BC) greater
than 0. This result is indeed immediately obvious when we
consider that, even if all the B’s were 4, and of the remaining
22 A’s 14 were (’s, there would still be 8 A4’s that were neither
B nor C.

14. The student should note the result of the last example, as it
illustrates the sort of result at which one may often arrive by
applying the conditions (4) to practical statistics. For given
values of &amp;, (4), (B), (C), (AB), and (4C), it will often happen
that any value of (BC) not less than zero (or, more generally, not
less than either of the lower limits (2) (a) and (2) (8) ) will satisfy
the conditions (4), and hence no true inference of a lower limit is
possible. The argument of the type ‘So many 4’s are B and
so many B’s are C' that we must expect some 4’s to be C'” must
be used with caution,

REFERENCES.

(1) MorcAN, A. DE, Formal Logic, 1847 (chapter viii, ‘On the Numerically
Definite Syllogism ”).

(2) Boog, G., Laws of Thought, 1854 (chapter xix., ‘‘ Of Statistical Condi-
tions”).

The iors are the classical works with respect to the general theory
of numerical consistence. The student will tind both difficult to follow
on account of their special notation, and, in the case of Boole’s work,
the special method employed.

(3) YuLe, G. U., “On the Theory of Consistence of Logical Class-frequencies
and its Geometrical Representation,” Phil. Trans., A, vol. excvii.
(1901), p. 91. (Deals at length with the theory of consistence for
any number of attributes, using the notation of the present chapters.)

Ler
        <pb n="44" />
        THEORY OF STATISTICS.
EXERCISES.

1. (For this and similar estimates cf. ‘Report by Miss Collet on the
Statistics of Employment of Women and Girls ” [C.—7564] 1894). If, in the
urban district of Bury, 817 per thousand of the women between 20 and 25
years of age were returned as ‘‘ occupied ” at the census of 1891, and 263 per
thousand as married or widowed, what is the lowest proportion per thousand
of the married or widowed that must have been occupied ?

2. If, in a series of houses actually invaded by small-pox, 70 per cent. of the
inhabitants are attacked and 85 per cent. have been vaccinated, what is the
lowest percentage of the vaccinated that must have been attacked ?

3. Given that 50 per cent. of the inmates of a workhouse are men, 60 per
cent, are ‘‘ aged ” (over 60), 80 per cent. non-able-bodied, 85 per cent. aged
men, 45 per cent. non-able-bodied men, and 42 per cent. non-able-bodied and
aged, find the greatest and least possible proportions of non-able-bodied aged
men.

4. (Material from ref. 5 of Chap. I.) The following are the proportions
per 10,000 of boys observed, with certain classes of defects amongst a number
of school-children. 4 =development defects, B=nerve signs, D=mental
dulness.

N =10,000 (DY =739

(4)= 877 (4B)=338

(B)= 1,086 (BD)=455
Show that some dull boys do not exhibit development defects, and state how
many at least do not do so.

5. The following are the corresponding figures for girls : —

N =10,000 (D) =689

(4)= 682 (4B)=248

(B)= "850 (BD) =3863
Show that some defectively developed girls are not dull, and state how many
at least must be so.

6. Take the syllogism “ All 4’s are B, all B’s are C, therefore all 4’s are
C,” express the premisses in terms of the notation of the preceding chapters,
and deduce the conclusion by the use of the general conditions of consistence.

7. Do the same for the syllogism ‘‘ All 4’s are B, no B’s are C, therefore
no 4’s are C.”

8. Given that (4)=(B)=(C)=%4, and that (4B)/N=(4C)/N=p, find
what must be the greatest or least values of p in order that we may infer
that (BC)/N exceeds any given value, say g.

9. Show that if ry &amp; ©

4) _ = 2
52 Fi 2% Nr 3z
(4B)_(4C)_(BO)_
: ENE Ey
the value of neither « nor 7 can exceed %.

24
nd
        <pb n="45" />
        CHAPTER IIL
ASSOCIATION.

1-4. The criterion of independence.—5-10. The conception of association and
testing for the same by the comparison of percentages—11-12.
Numerical equality of the differences between the four second-order
frequencies and their independence values—13. Coefficients of associa-
tion—14. Necessity for an investigation into the causation of an
attribute 4 being extended to include non-4’s.

I. Ir there is no sort of relationship, of any kind, between two

attributes 4 and B, we expect to find the same proportion of 4’s

amongst the B’s as amongst the non-A’s. We may anticipate,

for instance, the same proportion of abnormally wet seasons in
leap years as in ordinary years, the same proportion of male to
total births when the moon is waxing as when it is waning, the
same proportion of heads whether a coin be tossed with the right
hand or the left.

Two such unrelated attributes may be termed independent, and

we have accordingly as the criterion of independence for 4 and B—

(4B) _ (46) Cw
(B)  (B)
If this relation hold good, the corresponding relations
(a5) (a5)
8B) (B)
(4B) _(aB)
4) (a)
(4B) _(aB)
(4) (a)°
must also hold. Tor it follows at once from (1) that—
(Br-(45) _(B) (Af)
F2) = CON TY
! (B)

Fh
        <pb n="46" />
        THEORY OF STATISTICS.
that is (aD) i) (ap)
(B)  (B)’
and the other two identities may be similarly deduced.

The student may find it easier to grasp the nature of the rela-
tions stated if the frequencies are supposed grouped into a table
with two rows and two columns, thus: —

Attribute.
Attribute. — Total.
B B
2 (45) (48) (1)
a (aB) (aB) i (a)
CAT —— | CEE TEST ceo rm er c——
Total (B) 8) N

Equation (1) states a certain equality for the columns; if this

holds good, the corresponding equation
(4B) (eB)
4) (a

must hold for the rows, and so on.

2. The criterion may, however, be put into a somewhat
different and theoretically more convenient form. The equation
(1) expresses (AB) in terms of (B), (5), and a second-order fre-
quency (4); eliminating this second-order frequency we have—

(45) (ABYyL(AR) (4)
(By (By Hy
s.e. in words, “the proportion of 4’s amongst the B’s is the same
as in the universe at large.” The student should learn to recog-
nise this equation at sight in any of the forms—
HY,
(B) "&amp;
(4B) = (8) )
“am 2
AB
AB)=2"t"1
(am) -E3
4B) 4) B) 4
Eh
The equation (d) gives the important fundamental rule : If the attre-
butes A and B are independent, the proportion of AB's tn the universe
ts equal to the proportion of A’s multiplied by the proportion of B’s.

26
{.
        <pb n="47" />
        IIL.—ASSOCIATION. 1

The advantage of the forms (2) over the form (1) is that they
give expressions for the second-order frequency in terms of the
frequencies of the first order and the whole number of observa-
tions alone ; the form (1) does not.

Example i.—I1f there are 144 4’s and 384 B’s in 1024 observa-
tions, how many 4B’s will there be, 4 and B being independent ?

144 x 384
ras = OL,
1024
There will therefore be 54 AB's.

Example ii.—1If the A’s are 60 per cent., the B’s 35 per cent., of
the whole number of observations, what must be the percentage
of ABs in order that we may conclude that 4 and B are
independent ?

60 x 35
rp 2 3]
100
and therefore there must be 21 per cent. (more or less closely, cf.
§§ 7, 8 below) of 4B’s in the universe to justify the conclusion
that 4 and B are independent.

3. It follows from § 1 that if the relation (2) holds for any one
of the four second-order frequencies, e.g. (4B), similar relations
must hold for the remaining three. Thus we have directly
from (1)—

(4B) _(AB)+(4B) _ (4)
&gt; B) @B+B ¥
giving
4)(B)
Prml
(4B) ="=4
and so on. This is seen at once to be true on consideration
of the fourfold table on p. 26. For if (4B) takes the value
(4)(B)/N, (AB) must take the value (4)(B3)/N to keep the total
of the row equal to (4), and so on for the other rows and columns,
The fourfold table in the case of independence must in fact have
the form—
Attribute.
Attribute. Total.
- (4XBIN ~~ (4)B)N (4)
a (a)(B)/N (a)(B)/N (a)
Total {rs a i

2k
        <pb n="48" />
        THEORY OF STATISTICS.
Example iii.—In Example i. above, what would be the number
of af5’s, 4 and B being independent ?
(a)=1024 — 144 =880
(B)=1024 — 384 = 640
; _ 880 x 640
C0) (af) = 1000 = 550.

4. Finally, the criterion of independence may be expressed in
yet a third form, viz. in terms of the second-order frequencies
alone. If 4 and B are independent, it follows at once from the
preceding section that—

A)(B)(a
(4B)(af3) a (4)( a
And evidently (aB)(4p) is equal to the same fraction.
Therefore—

(AB)(aB) = (eB)(4B) (a))
(LB CO
(aB) (a8) (3)
AB B
2 ) rs (aB) (©)]
(4B) (of3)

The equation (b) may be read “The ratio of A’s to a’s amongst
the B’s is equal to the ratio of A’s to o’s amongst the 5's,” and
(c) similarly.

This form of criterion is a convenient one if all the four second-
order frequencies are given, enabling one to recognise almost at a
glance whether or not the two attributes are independent.

Example iv.—If the second-order frequencies have the following
values, are A and B independent or not?

(4B)=110 (eB) =90 (46) =290 {=f3)= 510.
Clearly (4B)(af3 &gt; (a.B)(AB),
so A and B are not independent.

5. Suppose now that 4 and B are not independent, but related
in some way or other, however complicated.

Then! (45-0)

A and B are said to be positively associated, or sometimes simply
associated. If, on the other hand,
Ln &lt; (B)
4B) &lt;AUB)
(A By &lt;2ts

28
        <pb n="49" />
        11T.—ASSOCIATION. 9
A and B are said to be negatively associated or, more briefly,
disassociated.

The student should notice that these words are not used
exactly in their ordinary senses, but in a technical sense. When
A and B are said to be associated, it is not meant merely that
some A’s are B’s, but that the number of A’s which are B’s exceeds
the number to be expected if A and B are independent. Similarly,
when 4 and B are said to be negatively associated or disassociated,
it is not meant that no 4’s are B’s, but that the number of A’s
which are B's falls short of the number to be expected if A and B
are independent. *“ Association” cannot be inferred from the mere
fact that some A’s are B’s, however great that proportion ; this
principle is fundamental, and should be always borne in mind.

6. The greatest possible value of (4B) for given values of
WN, (4), and (B) is either (4) or (B) (whichever is the less). When
(4B) attains either of these values, 4 and B may be said to be
completely or perfectly associated. The lowest possible value of
(4B), on the other hand, is either zero or (4)+ (B)— N (which-
ever is the greater). When (4.5) falls to either of these values,
4 and B may be said to be completely disassociated. Complete
association is generally understood to correspond to one or other
of the cases, “All 4’s are B” or “All B’s are 4,” or it may be
more narrowly defined as corresponding only to the case when
both these statements were true. Complete disassociation may
be similarly taken as corresponding to one or other of the cases.
“No 4’s are B,” or “no o’s are 8,” or more narrowly to the
case when both these statements are true. The greater the
divergence of (4B) from the value (4)(B)/N towards the limit-
ing value in either direction, the greater, we may say, is the
intensity of association or of disassociation, so that we may speak
of attributes being more or less, highly or slightly associated. This
conception of degrees of association, degrees which may in fact be
measured by certain formule (cf. § 13), is important.

7. When the association is very slight, v.e. where (4B) only
differs from (4)(B)/V by a few units or by a small proportion, it
may be that such association is not really significant of any
definite relationship. To give an illustration, suppose that a coin
is tossed a number of times, and the tosses noted in pairs; then
100 pairs may give such results as the following (taken from an
actual record) :—

First toss heads and second heads . . 26
3) 1 » tails . 3
First toss tails and second heads . ol
nT tails )

1)
A
"
1"
        <pb n="50" />
        THEORY OF STATISTICS.

If we use 4 to denote “heads” in the first toss, B “heads” in
the second, we have from the above (4)=44, (B)=53. Hence
(4)(B)|N = th 23-32, while actually (4B) is 26. Hence
there is a positive association, in the given record, between
the result of the first throw and the result of the second. But it
is fairly certain, from the nature of the case, that such association
cannot indicate any real connection between the results of the
two throws; it must therefore be due merely to such a complex
system of causes, impossible to analyse, as leads, for example, to
differences between small samples drawn from the same material.
The conclusion is confirmed by the fact that, of a number of such
records, some give a positive association (like the above), but
others a negative association.

8. An event due, like the above occurrence of positive associa-
tion, to an extremely complex system of causes of the general
nature of which we are aware, but of the detailed operation of
which we are ignorant, is sometimes said to be due to chance, or
better to the chances or fluctuations of sampling.

A little consideration will suggest that such associations due to
the fluctuations of sampling must be met with in all classes of
statistics. To quote, for instance, from § 1, the two illustrations
there given of independent attributes, we know that in any
actual record we would not be likely to find exactly the same
proportion of abnormally wet seasons in leap years as in ordinary
years, nor exactly the same proportion of male births when the
moon is waxing as when it is waning. But so long as the diver-
gence from independence is not well marked we must regard such
attributes as practically independent, or dependence as at least
unproved.

The discussion of the question, how great the divergence must
be before we can consider it as ““ well marked,” must be postponed
to the chapters dealing with the theory of sampling. At present
the attention of the student can only be directed to the existence
of the difficulty, and to the serious risk of interpreting a ‘chance
association ” as physically significant.

9. The definition of § 5 suggests that we are to test the
existence or the intensity of association between two attributes
by a comparison of the actual value of (4B) with its independence-
value (as it may be termed) (4)(B)/N. The procedure is from the
theoretical standpoint perhaps the most natural, but it is more
usual, and is simplest and best in practice, to compare proportions,
e.g. the proportion of 4’s amongst the B’s with the proportion
amongst the ’s. Such proportions are usually expressed in the
form of percentages or proportions per thousand.

30
        <pb n="51" />
        III. —ASSOCTATION. "1

It will be evident from §§ 1 and 2 that a large number of such
comparisons are available for the purpose, and the question arises,
therefore, which is the best comparison to adopt?

10. Two principles should decide this point: (1) of any two
comparisons, that is the better which brings out the more clearly
the degree of association ; (2) of any two comparisons, that is the
better which illustrates the more important aspect of the problem
under discussion.

The first condition at once suggests that comparisons of the
form

(4B) _ (48) )

®) ~ ®) “
are better than comparisons of the form

(48) (4)

@ F 0)
For it is evident that if most of the objects or individuals in the
universe are B's, i.e. if (B)/N approaches unity, (4B)/(B) will
necessarily approach (4)/N even though the difference between
(4B)[(B) and (4B)/(B) is considerable. The second form of
comparison may therefore be misleading.

Setting aside, then, comparisons of the general form (), the
question remains whether to apply the comparison of the form (a)
to the rows or the columns of the table, if the data are tabulated
as on p. 26. This question must be decided with reference to the
second principle, 7.e. with regard to the more important aspect of
the problem under discussion, the exact question to be answered,
or the hypothesis to be tested, as illustrated by the examples
below. Where no definite question has to be answered or
hypothesis tested both pairs of proportions may be tabulated,
as in Example vi.

Example v.— Association between inoculation against cholera
and exemption from attack. (Data from Greenwood and Yule,
Table II1., ref. 6.)

Not attacked. Attacked. Total.

Inoculated . . : 276 279

Not inoculated . 473 539
cL...

5
3
66
L0LalL 749 69 318
        <pb n="52" />
        THEORY OF STATISTICS.

Here the important question is, How far does inoculation
protect from attack? The most natural comparison is therefore—
Percentage of inoculated who were not attacked . 98:9

z not inoculated ow LLB
or we might tabulate the complementary proportions—

Percentage of inoculated who were attacked . aide]

2 not inoculated . a . a J 92

Either comparison brings out simply and clearly the fact that
inoculation and exemption from attack are positively associated
(inoculation and attack negatively associated).

We are making above a comparison by rows in the notation of
the table on p. 26, comparing (4B)/(4) with (aB)/(a), or (48)/(4)
with (af)/(a). A comparison by columns, ¢g. (4B)/(B) with
(4B)/(B), would serve equally to indicate whether there was any
appreciable association, but would not answer directly the
particular question we have in mind :—

Percentage of not-attacked who were inoculated . 30:8

ps attacked ¥ py ; . 43

Example vi—Deaf-mutism and Imbecility. (Material from
Census of 1901. Summary Tables. [Cd. 1523.])

Total population of England and Wales . . 32,528,000

Number of the imbecile (or feeble-minded) x 48,882

Number of deaf-mutes . ‘ : : 15,246

Number of imbecile deaf-mutes 451

Required, to find whether deaf-mutism is associated with
imbecility.

We may denote the number of the imbecile by (4), of deaf-
mutes by (B). A comparison of (4B)/(B) with (4)/N or of
(AB)/(4) with (B)/N may very well be used in this case, seeing
that (4)/N and (B)/N are both small. The question whether to
give the preference to the first or the second comparison depends
on the nature of the investigation we wish to make. If it is
desired to exhibit the conditions among deaf-mutes the first may
be used :—

Proportion of imbeciles among deaf-

= AEE) }20 6 per thousand.
Proportion of imbeciles in the whole 1'5

population = (4)/&amp; . : :

32
29
        <pb n="53" />
        IIL.—ASSOCIATION. 23

If, on the other hand, it is desired to exhibit the conditions
amongst the imbecile, the second will be preferable.

Proportion of deaf-mutes amongst 0

the imbecile (4B)/(4) . _ . }9 2 Dep shousanl
Proportion of deaf-mutes in the } 05

whole population (B)/N . .

Either comparison exhibits very clearly the high degree of asso-
ciation between the attributes. It may be pointed out, however,
that census data as to such infirmities are very untrustworthy.

Example vii.—Eye-colour of father and son (material due
to Sir Francis Galton, as given by Professor Karl Pearson, Phil.
Trams, A, vol. cxcv. (1900), p. 138; the classes 1, 2, and 3 of the
memoir treated as light).

Fathers with light eyes and sons with light eyes (4B) . 471

2 » 2 not light » (4B) . 151

»» not light " light y»' {aB) 8

» ” “ not light ,, (af) 230

Required to find whether the colour of the son’s eyes is
associated with that of the father’s. In cases of this kind the
father is reckoned once for each son; e.g. a family in which the
father was light-eyed, two sons light-eyed and one not, would be
reckoned as giving two to the class 48 and one to the class 4 B.

The best comparison here is—

Percentage of light-eyed amongst the sons

of light-eyed fathers . : ‘ ) 76 per cent.
Percentage of light-eyed amongst the sons \ 39

of not-light-eyed fathers . : ¥, 2

But the following is equally valid—

Percentage of light-eyed amongst the

fathers of light-eyed sons . id k } 76 per cent.
Percentage of light-eyed amongst the 40

fathers of not-light-eyed sons : : »

The reason why the former comparison is preferable is, that we
usually wish to estimate the character of offspring from that of
the parents, and define heredity in terms of the resemblance of
offspring to parents. We do not, as a rule, want to make use of
the power of estimating the character of parents from that of their
offspring, nor do we define heredity in terms of the resemblance
of parents to offspring. Both modes of statement, however,

E&gt; J

De
bh
        <pb n="54" />
        oA THEORY OF STATISTICS.
indicate equally clearly the tendency to resemblance between
father and son.

Example viii. —Association between inoculation against cholera
and exemption from attack, five separate epidemics (cf. Example
v., data from Tables IX., X., XXVIII, XXIX,, XXXI. of
reference 6).

Not Attacked. Attacked. Total.
Inoculated 192 4 196
Not inoculated . 113 34 147
Total . 305 38 343
Not Attacked. Attacked. Total.
Inoculated 5,751 Cr 5,778
Not inoculated . 6,351 198 6,549
Total . 12,102 225 12,327
Not Attacked. Attacked. Total.
Inoculated 4,087 5 4,092
Not inoculated . 113,856 1,144 115,000
Total . . 117,943 1,149 119,092
Not Attacked, Attacked. Total.
Inoculated . 8,332 8 8.340
Not inoculated 84,444 556 85,000
Total . . 92,776 564 93,340
Not Attacked. Attacked. Total.
Inoculated 4.870 5 4,875
Not inoculated 153,096 904 154,000
Total . 157,966 909 158,875

With the table of Example v. the above give data for six

separate epidemics, in all of which the same method of inocula-

Ra
        <pb n="55" />
        ITT.—ASSOCIATION. )
tion appears to have been used: the data refer to natives only,
and the numbers of observations are sufficiently large to reduce
“fluctuations of sampling” within reasonably narrow limits.
The proportions not attacked are as follows :—

Proportion not Attacked.
Not Inoculated. Inoculated. Difference.
0-8776 0-9892 0-1116
07687 09796 02109
09698 09953 0-0255
0-9901 0-9988 0-0087
09935 0-9990 00055
09941 0-9990 0-0049
In cach case inoculation and exemption from attack are positively
associated, but it will be seen that the several proportions, and
the differences between them, vary considerably. Evidently in
a very mild epidemic this difference can only be small, and the
question arises how far the data for the separate epidemics can
be said to be consistent in their indication of the “efficiency ”
of the inoculation. This is not a simple question to answer:
the more advanced student is referred to the discussion in the
original.
11. The values that the four second-order frequencies take in
the case of independence, viz.—
(A)B) (a)(B) (4)(B) (a)(B)
ERAS REY. Bae ik
are of such great theoretical importance, and of so much use
as reference-values for comparing with the actual values of
the frequencies (4.8) (aB) (48) and (aB), that it is often desir-
able to employ single symbols to denote them. We shall use
the symbols—
4)(B
(ap), =E (op), -@B)
a)(B 4
(B= AB) yp), DB)

-
3:
3
        <pb n="56" />
        THEORY OF STATISTICS.
If § denote the excess of (AB) over (4B), then, in order tc keep
the totals of rows and columns constant, the general table
(¢f. the table for the case of independence on p. 27) must
be of the form
Attribute.
Attribute. Total.
5 8
he (AB), +3 (AB)y—-d - (4)
a (aB)y— 3 (aB)y +0 (a)
Total “= (B) (R) | v
Therefore, quite generally we have—
(4B) - (4B), = (af) = (aB)y = (4B), = (48) = (@B), = (aB).

12. The value of this common difference 8 may be expressed
in a form that is useful to note. We have by definition —

5= (4B) - (4B), = (4B) - Lh
Bring the terms on the right to a common denominator, and
express all the frequencies of the numerator in terms of those of
the second order ; then we have—

rl (ABIAB) + (B+ 48) + 0]
\ -[(4B) + (4B)][(45) + («B)]
= 1 {4B)oP) - (B)(4B) |

That is to say, the common difference is equal to 1/Nth of the
difference of the cross products” (4.B)(af) and (aB)(4f).

It is evident that the difference of the cross-products may be
very large if IV be large, although 8 is really very small. In
using the difference of the cross-products to test mentally the
sign of the association in a case where all the four second-order
frequencies are given, this should be remembered : the difference
should be compared with , or it will be liable to suggest a higher
degree of association than actually exists.

Example ix.—The following data were observed for hybrids of

36
        <pb n="57" />
        ITI.—ASSOCIATION. 7
Datura (W. Bateson and Miss Saunders, Report to the Evolution
Committee of the Royal Society, 1902) :—
Flowers violet, fruits prickly (4.8) : dT
ov yy, smooth (43) : 3
Flowers white, ,, prickly (aB) . 2]
. yy smooth (af) : £8

Investigate the association between colour of flower and char-
acter of fruit.

Since 3 x 47=141, 12x 21=252, de. (4B) (aff)&lt;(aB) (4B),
there is clearly a negative association; 252 - 141=111, and at
first sight this considerable difference is apt to suggest a consider-
able association. But 6=111/83=1'3 only, so that in point of
fact the association is small, so small that no stress can be laid
on it as indicating anything but a fluctuation of sampling.
Working out the percentages we have—

Percentage of violet-flowered plants with 80 per ort,
prickly fruits :

Percentage of white-flowered plants with | 87
prickly fruits . : : : 23

13. While the methods used in the preceding pages suffice for
nearly all practical purposes, it may be convenient to measure
the intensities of association in different cases by means of some
formula or “ coefficient,” so devised as to be zero when the attri-
butes are independent, +1 when they are completely associated,
and —1 when they are completely disassociated, in the sense of
§ 6. If we use the term “complete association” in the wider
sense there defined, we have, grouping the frequencies in fourfold
tables, the three cases of complete association :—

|
(4 J @B)|
(a? ® | (a8) | @
® | ® ® | (BB) |B) fv

In the first case all 4’s are B, and so (4B8)=0; in the second
all B’s are 4 and so (aB)=0; and in the third case we have (4)=

5
= (3)
        <pb n="58" />
        THEORY OF STATISTICS.
(B)= (4B), so that all A’s are B and also all B’s are 4. The
three corresponding cases of complete disassociation are—
(4) (7) (8)
ol [tayias) Bryce cay
| (aB) i . | ’ i ‘a)
GES —— — a] te -— Cr —
B® SI yaw
It is required to devise some formula which shall give the value
+1 in the first three cases, —1 in the second three, and shall
also be zero where the attributes are independent. Many such
formule may be devised, but perhaps the simplest possible (though
not necessarily the most advantageous) is the expression—
@=(4B)(aB) - (4B)(aB)
(48)(af) + (4F)(eB)
151 No
(@B)ah) + (BYE)
—where § is the symbol used in the two last sections for the
difference (4B) —~ (4B),. It is evident that @ is zero when the
attributes are independent, for then 6 is zero: it takes the value +1
when there is complete association, for then the second term in
both numerator and denominator of the first form of the expression
is zero: similarly it is — 1 where there is complete disassociation,
for then the first term in both numerator and denominator is
zero. () may accordingly be termed a coefficient of association.
As illustrations of the values it will take in certain cases, the
association between deaf-mutizm and imbecility, on the basis of the
English census figures (Example vi.) is +091 ; between light eye
colour in father and in son (Ex _ ‘nla vii.) +066 ; between colour of
flower and prickliness of fruit ia vatura (Example ix.) — 0°28, an
association which, however, as already stated, is probably of no
practical significance and due to mere fluctuations of sampling.
The student should note that the value of @ for a given table
is unaltered by multiplying either a row or a column by any
arbitrary number, 7.c. the value is independent of the relative
proportions of A’s and o’s included in the table. This property
is of importance, and renders such a measure of association
specially adapted to cases (e.g. experiments) in which the propor-
tions are arbitrary. A form possessing the same property but
certain marked advantages over @ is suggested in ref. (3).

38
        <pb n="59" />
        III. —ASSOCIATION. 9

The coefficient is only mentioned here to direct the attention
of the student to the possibility of forming such a measure of
association, a measure which serves a similar purpose in the case
of attributes to that served by certain other coefficients in the
cases of manifold classification (cf. Chap. V.) and of variables
(¢f. Chap. IX., and the references to Chaps. X. and XVI). For
further illustrations of the use of this coefficient the reader is
referred to the reference (1) at the end of this chapter; for the
modified form of the coefficient, possessing the same properties
but certain advantages, to ref. (3); and for a mode of deducing
another coefficient, based on theorems in the theory of variables,
which has come into more general use, though in the opinion of
the present writer its use is of doubtful advantage, to ref. (4).
Reference should also be made to the coefficient described in § 10
of Chap. XI. The question of the best coefficient to use as a
measure of association is still the subject of controversy: for a
discussion the student is referred to refs. (3), (5), and (6).

14. In concluding this chapter, it may be well to repeat, for the
sake of emphasis, that (cf. § 5) the mere fact of 80, 90, or 99 per
cent. of A’s being B implies nothing as to the association of 4
with B; in the absence of information, we can but assume that
80, 90, or 99 per cent. of a’s may also be B. In order to apply
the criterion of independence for two attributes 4 and B, it is
necessary to have information concerning a’s and A’s as well as
A’s and B’s, or concerning a universe that includes both a’s and
A’s, B's and B’s. Hence an investigation as to the causal
relations of an attribute 4 must not be confined to 4’s, but must
be extended to a’s (unless, of course, the necessary information
as to a's is already obtainable): no comparison is otherwise
possible. It would be no use to obtain with great pains the
result (¢f. Example vi.) that 296 per thousand of deaf-mutes
were imbecile unless we knew that the proportion of imbeciles
in the whole population was only 1'5 per thousand ; nor would
it contribute anything to our know dge of the heredity of deaf-
mutism to find out the proportion of deaf-mutes amongst the
offspring of deaf-mutes unless the proportions amongst the off-
spring of normal individuals were also investigated or known.

REFERENCES.

(1) YuLe, G. U., “On the Association of Attributes in Statistics,” Phil.
Trans. Roy. Soc., Series A, vol. cxciv., 1900, p. 257. (Deals fully
with the theory of association : the association coefficient of § 13
suggested.)

(2) Yur, G. U., ‘““Notes on the Theory of Association of Attributes in
Statistics,” Biometrika, vol. ii., 1903, p. 121. (Contains an abstract
of the principal portions of (1) and other matter.)

3:
        <pb n="60" />
        THEORY OF STATISTICS.

(3) Yurz, G. U., “On the Methods of Measuring the Association between Two
Attributes,” Jour. Roy. Siat. Soc., vol. 1xxv., 1912, pp. 579-642. (A
critical survey of the various coefficients that have been suggested for
measuring association and their properties: a modified form of the
coefficient of § 13 given which possesses marked advantages.)

(4) PEARSON, KARL, “On the Correlation of Characters not Quantitatively
Measurable,” Phil. Trans. Roy. Soc., Series A, vol. cxev., 1900, p. 1.
(Deals with the problem of measurement of intensity of association
from the standpoint of the theory of variables, giving a method which
has since been largely used : only the advanced student will be able to
follow the work. For a criticism see ref. 3.)

(5) PEARSON, Kary, and DAvip Heron, “On Theories of Association,”
Biometrika, vol. ix., 1913, pp. 159-832. (A reply to criticisms in ref. 3.)

(6) GREENWOOD, M., and G. U. YuLg, “The Statistics of Anti-typhoid and
Anti-cholera Inoculations, and the interpretation of such statistics in
general,” Proc. Roy. Soc. of Medicine, vol. viii., 1915, p. 118. (Cited
for the discussion of association coefficients in § 4, and the conclusion
that none of these coefficients are of much value for comparative pur-
poses in interpreting statistics of the type considered.)

(7) Lipps, G. F., “Die Bestimmung der Abhiingigkeit zwischen den Merkmalen
eines Gegenstandes,” Berichte d. math. -phys. Klasse d. kgl. sdchsischen
Gesellschaft d. Wissenschaften, Leipzig, Feb. 1905. (Deals with the
general theory of the dependence between two characters, however
classified ; the coefficient of association of § 13 is again suggested inde-
pendently.)

EXERCISES.

1. At the census of England and Wales in 1901 there were (to the nearest
1000) 15,729,000 males and 16,799,000 females; 3497 males were returned
as deaf-mutes from childhood, and 3072 females.

State proportions exhibiting the association between deaf-mutism from
childhood and sex. How many of each sex for the same total number would
have been deaf-mutes if there had been no association ?

2. Show, as briefly as possible, whether 4 and B are independent, posi-
tively associated, or negatively associated in each of the following cases :—

(@) N =5000 (4) =2350 (B) =3100 (4B)=1600
©) (4) = 490 (4B)= 294 («) = 570 (aB)= 380
(c) (4B)= 256 (aB) = 768 (4B)= 48 (af) ="144

3. (Figures derived from Darwin’s Cross- and Self-fertilisation of Plants,
¢f. ref. 1, p. 294.) The table below gives the numbers of plants of certain
species that were above or below us average height, stating separately those
that were derived from cross-fertilised and from self-fertilised parentage
Investigate the association between height and cross-fertilisation of parentage,
and draw attention to any special points you notice.

Parentage Cross-fer- Parentage Self-fer-
tilised. Height— tilised. Height—
Species. 3
Above | Below Above | Below
Average. Average. | Average. | Average.

Ipomaa purpurea . . . 63 10 18 b5

Petunia violacea . . 61 16 13 64

Reseda lutea =. . . 25 7 11 21

Reseda odorata  . . . . 39 | 16 25 30

Lobelia fulgens . : . 17 17 12 22

4.0
        <pb n="61" />
        IIL.—ASSOCIATION, 41

4. (Figures from same source as Example vii. p. 33, but material differently

grouped ; classes 7 and 8 of the memoir treated as *° dark.”) Investigate the

association between darkness of eye-colour in father and son from the following
data: —
Fathers with dark eyes and sons with dark eyes (4B). 50
ry &gt; 5 not-dark eyes (48). 79
Fathers with not-dark eyes and sons with dark eyes (eB) . 89
" » 32 not-dark eyes (af) . 782

Also tabulate for comparison the frequencies that would have been observed
had there been no heredity, 7.e. the values of (4B)y, (AB), ete. (811).

5. (Figures from same source as above.) Investigate the association between
eye colour of husband and eye colour of wife (‘‘assortative mating”) from
the data given below.

Husbands with light eyes and wives with light eyes (4B). 309
3 . ’ not-light eyes (48). 214
Husbands with not-light eyes and wives with light eyes (aB). 132
2 ”, » not-light eyes (a8) . 119

Also tabulate for comparison the frequencies that would have been observed
had there been strict independence between eye colour of husband and eye
colour of wife, 7.e. the values of (4B),, ete., as in question 4.

6. (Figures from the Census of England and Wales, 1891, vol. iii. : the data
cannot be regarded as trustworthy.) The figures given below show the
number of males in successive age groups, together with the number of the
blind (4), of the mentally-deranged (5), and the blind mentally-deranged
(AB). Trace the association between blindness and mental derangement
from childhood to old age, tabulating the proportions of insane amongst the
whole population and amongst the blind, and also the association coefficient
Q of § 13. Give a short verbal statement of your results.

CT ee ea

N 3,304,230 2,712,521 2,089,010 1,611,077 1,191,789 770,124 444,896 161,692

(A) 844 1,184 1,165 1,501 1752 1,905 1,932; 1701
| (2) 2,820 6,225 8,482 9,214 8,187 | 5,799 3,412 1,008

(4B) 17 | 1 19 | 81 | 32 34 22 9

7. Show that if

(4B), (aB), (4B), (aB)
(4B); (aB); (4B); (aB),
be two aggregates corresponding to the same values of (4), (B), (a), and (8),
(4B), -(4B),= (aB), — (aB), =(4B)y- (4B) = (aB), — (aB)a
8. Show that if
3=(4B)- (4B),
(4B)+ (eB) ~ (aB) - (4B8)2=[(d) - (a)][(B) - (8)] + 2IV. 5.

9. The existence of association may be tested either by comparison of pro-
portions (e.g. (4B)/(B) with (4B)/(B)), as in §§ 9, 10, or by the value of 3, us
in §§ 11, 12. Show that

s=2NB (4D) (5)
NU

=a) (45) el
(Fy Lid) {a}
        <pb n="62" />
        CHAPTER IV.
PARTIAL ASSOCIATION,

1-2. Uncertainty in interpretation of an observed association—38-5. Source of
the ambiguity : partial associations—6-8. Illusory association due
to the association of each of two attributes with a third—9. Estima-
tion of the partial associations from the frequencies of the second
order—10-12. The total number of associations for a given number
of attributes—13-14. The case of complete independence.

1. If we find that in any given case

(4B)&gt; or $3)

all that is known is that there is a relation of some sort or kind
between 4 and B. The result by itself cannot tell as whether
the relation is direct, whether possibly it is only due to fluctuations
of sampling” (cf. Chap. III. § 7-8), or whether it is of any other
particular kind that we may happen to have in our minds at the
moment. Any interpretation of the meaning of the association is
necessarily hypothetical, and the number of possible alternative
hypotheses is in general considerable.

9. The commonest of all forms of alternative hypothesis is of
this kind : it is argued that the relation between the two attributes
A and B is not direct, but due, in some way, to the association of
A with C and of B with C. An illustration or two will make the
matter clearer :—

(1) An association is observed between vaccination” and
« exemption from attack by small-pox,” i.e. more of the vaccinated
than of the unvaccinated are exempt from attack. It is argued
that this does not imply a protective effect of vaccination, but is
wholly due to the fact that most of the unvaccinated are drawn from
the lowest classes, living in very unhygienic conditions. Denoting
vaccination by A, exemption from attack by B, hygienic conditions by
C, the argument is that the observed association between 4 and J
is due to the associations of both with C
Q°;
il
        <pb n="63" />
        IV.—PARTIAL ASSOCIATION. .

(2) It is observed, at a general election, that a greater
proportion of the candidates who spent more money than their
opponents won their elections than of those who spent less. It
is argued that this does not mean an influence of expenditure on
the result of elections, but is due to the fact that Conservative
principles generally carried the day, and that the Conservatives
generally spent more than the Liberals. Denoting winning by 4,
spending more than the opponent by B, and Conservative by C, the
argument is the same as the above (¢f. Question 9 at the end of
the chapter).

(3) An association is observed between the presence of some
attribute in the father and its presence in the son ; and also
between the presence of the attribute in the grandfather and its
prescuce in the grandson. Denoting the presence of the attribute
in son, father, and grandfather by 4, B, and C, the question arises
whether the association between 4 and C may not be due solely
to the associations between 4 and B, B and C, respectively.

3. The ambiguity in such cases evidently arises from the fact
that the universe of observation, in each case, contains not
merely objects possessing the third attribute alone, or objects
not possessing it, but both.

If the universe were restricted to either class alone the given
ambiguity would not arise, though of course others might remain.

Thus, in the first illustration, if the statistics of vaccination
and attack were drawn from one narrow section of the population
living under approximately the same hygienic conditions, and an
association were still observed between vaccination and exemption
from attack, the supposed argument would be refuted. The fact
would prove that the association between vaccination and

exemption could not be wholly due to the association of both with
hygienic conditions.

Again, in the second illustration, if we confine our attention to
the “universe ” of Conservatives (instead of dealing with candidates
of both parties together), and compare the percentages of Conserva-
tives winning elections when they spend more than their opponents
and when they spend less, we shall avoid the possible fallacy. If
the percentage is greater in the former case than in the latter, it
cannot be for the reasons suggested in § 2.

The biological case of the third illustration should be similarly
treated. If the association between 4 and ¢ be observed for
those cases in which all the parents, say, possess the attribute, or
else all do not, and it is still sensible, then the association first
observed between 4 and C' for the whole universe cannot have
yg due solely to the observed associations between 4 and B, B
and CO.

435
        <pb n="64" />
        44 THEORY OF STATISTICS.

4. The associations observed between the attributes 4 and B
in the universe of C's and the universe of y's may be termed
partial associations, to distinguish them from the total associations
observed between A and B in the universe at large. In terms of
the definition of § 5 of Chap. IIL, 4 and B will be said to be posi-
tively associated in the universe of Cs (cf. § 4 of Chap. II.) when

40) (BC)
Anas HERD ow a
(ABO (1)
and negatively associated in the converse case.

As in the simpler case, the association is most simply tested by
a comparison of percentages or proportions (§ 9, Chap. IIL),
although for some purposes a coefficient of association” of
some kind may be useful. Confining our attention to the more
fundamental method, if 4 and B are positively associated within
the universe of Cs, we must have, to quote only the four most
convenient comparisons,

(4BC) _ (40) (4BC)_ (BC) b
me” ow OER enw Os
(480) _ (450) © (450) _ (aBO) @
(BC) = (BC) (40) ~ (20)

These inequalities may easily be rewritten for any other case by
making the proper substitutions in the symbols; thus to obtain
the inequalities for testing the association between 4 and C' in
the universe of B’s, B must be written for C, 8 for vy, and wice
versd, throughout; it being remembered that the order of the
letters in the class symbol is immaterial. The remarks of § 10,
Chap. IIL, as to the choice of the comparison to be used, apply of
course equally to the present case.

5. Though we shall confine ourselves in the present work to
the detailed discussion, of the case of three attributes, it should be
noticed that precisely similar conceptions and formule to the
above apply in the general case where more than three attributes
have been noted, or where the relations of more than three have
to be taken into account. If, when it is observed that 4 and B
are still associated within the universe of C's, it is argued that
this is due to the association of both 4 and B with D, the argu-
ment may be tested by still further limiting the field of observa-
tion to the universe CD. If

(ABCD)&gt; (LCI aeD CONC,
A and B are positively associated within the universe of CDs,
and the association cannot be wholly ascribed to the presence and
        <pb n="65" />
        IV.—PARTIAL ASSOCIATION. 45
absence of J as suggested, nor to the presence and absence of
C and D conjointly. If it be then argued that the presence
and absence of Z is the source of association, the process may
be repeated as before, the association of 4 and B being tested
for the universe C'DZ, and so on as far as practicable.

Partial associations thus form the basis of discussion for any
case, however complicated. The two following examples will
serve as illustrations for the case of three attributes.

Example i.—(Material from ref. 5 of Chap. 1.)

The following are the proportions per 10,000 of boys observed
with certain classes of defects, amongst a number of school
children. (4) denotes the number with development defects, (B)
with nerve-signs, (2) the number of the “dull.”

rl 10,000 (4B) 338

(4) 8717 (4D) 338

(B) 1,086 (BD) 455

(D) 789 (ABD) 153
The Report from which the figures are drawn concludes that “the
connecting link between defects of body and mental dulness is
the coincident defect of brain which may be known by observation
of abnormal nerve-signs.” Discuss this conclusion.

The phrase “connecting link ” is a little vague, but it may
mean that the mental defects indicated by nerve-signs B may
give rise to development-defects 4, and also to mental-dul-
ness 0; A and D being thus common effects of the same cause
B (or another attribute necessarily indicated by B), and not
directly influencing each other. The case is thus similar to that
of the first illustration of § 2 (liability to small-pox and to. non-
vaccination being held to be common effects of the same circum-
stances), and may be similarly treated by investigation of the
partial associations between 4 and D for the universes B and f.
As the ratios (4)/N, (B)/N, (D)/N are small, comparisons of the
form (4) (5) of Chap. III. (p. 31), or (2) (a) (0) above, may very
well be used (¢f. the remarks in § 10 of the same chapter,
p. 31).

The following figures illustrate, then, the association between
4 and D for the whole universe, the B-universe and the B-
universe :—

For the entire material :—

Proportion of the dull=(D)/N . . == 7°9 per cent.

yy  defectively develo h
were dull =(4D)/(4) + ped ¥ °Y= 8 ans "

ge
        <pb n="66" />
        THEORY OF STATISTICS.
For those exhibiting nerve signs :—
Proportion of the dull=(BD/(B)  . = 2 =a per cent,
og ,, defectively developed who
Sel a == 1573 ”
For those not exhibiting nerve signs :—
Proportion of the dull=(8D)/(8) i =e EES ys
Py ,,  defectively developed :
lla it = Eat

The results are extremely striking ; the association between A
and D is very high indeed both for the material as a whole (the
universe at large) and for those not exhibiting nerve-signs (the
B-universe), but it is very small for those who do exhibit nerve-
signs (the B-universe).

This result does not appear to be in accord with the conclusion
of the Report, as we have interpreted it, for the association
between A and D in the B-universe should in that case have
been very low instead of very high.

Example ii.—Eye-colour of grandparent, parent and child.
(Material from Sir Francis Galton’s Natural Inheritance (1889),
table 20, p. 216. The table only gives particulars for 78 large
families with not less than 6 brothers or sisters, so that the
material is hardly entirely representative, but serves as a good
illustration of the method.) The original data are treated as in
Example vii. of the last chapter (p. 33). Denoting a light-eyed
child by 4, parent by B, grandparent by C, every possible line of
descent is taken into account. Thus, taking the following two
lines of the table,

Children Parents Grandparents
A. a. B. B. C, vy
Light-eyed. Ted, Light-eyed. rd, Light-eyed. od
4 5 1 1 1 3
3 4 1 1 4 0
the first would give 4 x 1 x 1 =4 to the class ABC, 4x 1 x 3=12to
the class ABy, ¢ to ASC, 12 to 4By, 5 to «BC, 15 to aBy, 5 to
aC, and 15 to afy; the second would give 3x1x4=12 to the
class ABC, 12 to ABC, 16 to a BC, 16 to aC, and none to the re-
mainder. The class-frequencies so derived from the whole table are,
(4B0C) 1928 (aBC) 303
(4 By) 596 (a By) 225
(480) 552 (aC) 395
(4B8y) 508 (ay) 501

46
        <pb n="67" />
        IV.—PARTIAL ASSOCIATION. “7

The following comparisons indicate the association between
grandparents and parents, parents and children, and grand-
parents and grandchildren, respectively :—

Grandparents and Parents.

Proportion of light-eyed amongst hed _(BO)_2231 _ : t

children of light-eyed grand parents (Cc) 3178 70:2 percent,
Proportion of light-eyed amongst the B 8921

children of not-light-eyed A *»

7)
parents . . .
Parents and Children.

Proportion of light-eyed amongst the | _ (45) _2524 _

children of light-eyed parents «JT (BY 3052 327 percent
Proportion of light-eyed amongst the _(48) _ 1060 _ 542

children of not-light-eyed parents. (B) 1956

In both the above cases we are really dealing with the
association between parent and offspring, and consequently the
intensity of association is, as might be expected, approximately
the same ; in the next case it is naturally lower: —

Grandparents and Grandchildren.
Proportion of light-eyed amongst the 0
grandchildren of light-eyed grand- =U) 2 = 180 per cent.
parents . : (C)
Proportion of light-eyed amongst the )
grandchildren of not-light-eyed = 11H go 13 kay
grandparents : : . : ()

We proceed now to test the partial associations between grand-
parents and grandchildren, as distinct from the total associations
given above, in order to throw light on the real nature of the
resemblance. There are two such partial associations to be
tested : (1) where the parents are light-eyed, (2) where they are
not-light-eyed. The following are the comparisons :—

Grandparents and Grandchildren : Parents light-eyed.
Proportion of light-eyed amongst the B 1928
grandchildren of light-eyed grand- 420), 1923 864 per cent.
(BC) ~ 2231
parents . .
Proportion of light-eyed amongst the
grandchildren of not-light-eyed = (42) = or =T28 "
grandparents . ; &gt; . (87)

4"
        <pb n="68" />
        THEORY OF STATISTICS.
Grandparents and Grandchildren : Parents not-light-eyed.
Proportion of light-eyed amongst the
Ts of a lL | tie = oe =58'3 per cent.
parents . .
Proportion of light-eyed amongst the
ais of nigh LL B08 503 mn
grandparents ~~. : {ey LS

In both cases the partial association is quite well-marked and
positive ; the total association between grandparents and grand-
children cannot, then, be due wholly to the total associations
between grandparents and parents, parents and children, re-
spectively. There is an ancestral heredity, as it is termed, as
well as a parental heredity.

We need not discuss the partial association between children and
parents, as it is comparatively of little consequence. It may be
noted, however, as regards the above results, that the most
important feature may be brought out by stating three ratios
only.

If 4 and B are positively associated, (4.8)/(B)&gt;(4)/N.

If 4 and C are positively associated in the universe of Bs,
(4BC)/(BC)&gt;(4B)/(B). Hence (4)/N,(4B)/(B),and (4BC)/(BC)
form an ascending series. Thus we have from the given data—

ion of light- among
Foi " 2 Ci &gt; 2 = (yy = 116 peleent,
Proportion of light- amongst the
en of ht parents : ENABY(E) S321 ss
Proportion of light-eyed amongst the
children of light-eyed parents and | =(4B0)/(BC)=864 ,,
grandparents : :

If the great-grandparents, etc., etc., were also known, the series
might be continued, giving (4BCD)/(BCD), (ABCDE)/(BCDE),
and so forth. The series would probably ascend continuously
though with smaller intervals, 4 and D being positively associated
in the universe of BC’s, 4 and Z in the universe of BC D’s, etc.

6. The above examples will serve to illustrate the practical
application of partial associations to concrete cases. = The general
nature of the fallacies involved in interpreting associations
between two attributes as if they were necessarily due to the
most obvious form of direct causation is more clearly exhibited
by the following theorem :—

If A and B are independent within the universe of C’s and also
within the universe of v's, they will nevertheless be associated
within the universe at large, unless C vs independent of either A
or B or both.

48
        <pb n="69" />
        IV.—PARTIAL ASSOCIATION.
The two data give—
a (AC)(BC :
(4BC) = Ms |
| )
(45) — AVE) Id) = (CB) ~ (BOY)
(¥) ¥)
Adding them together we have—
(45)= ri | MACY BC) - (A)(C)BO) ~ (BXCNAC) +(A)BXO) }
Write, as in § 11 of Chap. ITI. (p. 35)—
(4)(B) A)(c (B)(C)

(48), = NB) (40), NO) (py, BNO)
subtract (45), from both sides of the above equation, simplify,
and we have

N

(4B) ~ (4B)y = ((5[(AC) ~ (4C)IBO) - (BOY) (4)
This proves the theorem; for the right-hand side will not be
zero unless either (AC) =(4C), or (BC) = (£C),.

7. The result indicates that, while no degree of heterogeneity
in the universe can influence the association between 4 and B
if all other attributes are independent of either 4 or 7B or both,
an illusory or misleading association may arise in any case where
there exists in the given universe a third attribute C' with which
both 4 and B are associated (positively or negatively). If both
associations are of the same sign, the resulting illusory association
between 4 and B will be positive ; if of opposite sign, negative.
The three illustrations of § 2 are all of the first kind. In (1) it
is argued that the positive associations between vaccination and
hygienic conditions, exemption from attack and hygienic conditions,
give rise to an illusory positive association between vaccination
and exemption from attack. In (2) it is argued that the positive
associations between conservative and winning, conservative and
spending more, give rise to an illusory positive association between
winning and spending more. In (3) the question is raised whether
the positive association between grandparent and grandchild may
not be due solely to the positive associations between grandparent
and parent, parent and child.

Misleading associations of this kind may easily arise through

a4

49
(3
        <pb n="70" />
        THEORY OF STATISTICS.
the mingling of records, e.g. respecting the two sexes, which a
careful worker would keep distinct.

Take the following case, for example. Suppose there have been
200 patients in a hospital, 100 males and 100 females, suffering
from some disease. Suppose, further, that the death-rate for males
(the case mortality) has been 30 per cent., for females 60 per cent.
A new treatment is tried on 80 per cent. of the males and 40 per
cent. of the females, and the results published without distinction
of sex. The three attributes, with the relations of which we are
here concerned, are death, treatment and male sex. The data show
that more males were treated than females, and more females
died than males ; therefore the first attribute is associated nega-
tively, the second positively, with the third. It follows that there
will be an illusory negative association between the first two—
death and treatment. If the treatment were completely inefficient
we would, in fact, have the following results :—

Males. Females. Total.
Treated and died . . . 4 24 48
» and did not die . 6 16 72
Not treated and died . : ; 36 42
ry and did not die , i 4 38
v.e. of the treated, only 48/120 =40 per cent. died, while of those
not treated 42/80 =0525 per cent. died. If this result were stated
without any reference to the fact of the mixture of the sexes, to
the different proportions of the two that were treated and to the
different déath-rates under normal treatment, then some value in
the new treatment would appear to be suggested. To make
a fair return, either the results for the two sexes should be
stated separately, or the same proportion of the two sexes
must receive the experimental treatment. Further, care would
have to be taken in such a case to see that there was no
selection (perhaps unconscious) of the less severe cases for treat-
ment, thus introducing another source of fallacy (deat’ positively
associated with severity, treatment negatively associated with
severity, giving rise to illusory negative association between
treatment and death).

A misleading association between the characters of parent and
offspring might similarly be created if the records for male-male
and female-female lines of descent were mixed. Thus suppose 50
per cent. of males and 10 per cent. of females exhibit some
attribute for which there is no association in either line, then we
would have for each line and for a mixed record of equal
numbers—

50
        <pb n="71" />
        IV.—PARTIAL ASSOCIATION. 51

Male line. ~~ Female line. Mixed record.

p RISIISWIRA Dirt 3d 25 percent. 1 per cent. 13 per cent.
Parents with attribute and | 2% 17

children without . a } ! =
Parents without attribute | | 5 17

and children with } f » B . .
Parents without attribute

and children without . 25 y Hl BS
Here 13/30 =43 per cent. of the offspring of parents with the
attribute possess the attribute themselves, but only 17/70 =24
per cent. of the offspring of parents without the attribute. The
association between attribute in parent and attribute in offspring
is, however, due solely to the association of both with male sex.
The student will see that if records for male-female and female-
male lines were mixed, the illusory association would be negative,
and that if all four lines were combined there would be no illusory
association at all.

8. Illusory associations may also arise in a different way
through the personality of the observer or observers. If the
observer’s attention fluctuates, he may be more likely to notice
the presence of 4 when he notices the presence of B, and vice
versa ; in such a case 4 and B (so far as the record goes) will both
be associated with the observer's attention C, and consequently
an illusory association will be created. Again, if the attributes
are not well defined, one observer may be more generous than
another in deciding when to record the presence of 4 and also
the presence of 5, and even one observer may fluctuate in the
generosity of his marking. In this case the recording of 4 and
the recording of 2B will both be associated with the generosity
of the observer in recording their presence, C, and an illusory
association between 4 and 2 will consequently arise, as
before.

9. It is important to notice that, though we cannot actually
determine the partial associations unless the third-order frequency
(4BC) is given, we can make some conjecture as to their sign
from the values of the second-order frequencies.

Suppose, for instance, that—

(dBc) = HOEY) 4 |
(©) 16 =)
(dy)(By) |
ABy) = 3
( y) (y) + 0g |

(5
        <pb n="72" />
        : THEORY OF STATISTICS.
so that 8; and §, are positive or negative according as 4 and
are positively or negatively associated in the universes of (' and
y respectively. Then we have by addition—
(40)(BO) , (Ay)(By)
AB)="—~"— L415 +6 . (6
Hence if the value of (4B) exceed the value given by the first
two terms (z.e. if 8, + 8, be positive), 4 and B must be positively
associated either in the universe of (’s, the universe of v's, or
both. If, on the other hand, (4B) fall short of the value given by
the first two terms, 4 and B must be negatively associated in
the universe of Cs, the universe of 4’s, or both. Finally, if
(4B) be equal to the value of the first two terms, 4 and B must
be positively associated in the one partial universe and negatively
in the other, or else independent in both.

The expression (6) may often be used in the following form,

obtained by dividing through by, say, (B)—
(UD)_(40) (BO), (Ay) (BY) 8+&amp;
(eh A Gy) YD)
In using this expression we make use solely of proportions or
percentages, and judge of the sign of the partial associations
between 4 and B accordingly. A concrete case, as in Example iii.
below, is perhaps clearer than the general formula.

Example iii.—(Figures compiled from Supplement to the Fifty-
fifth Annual Report of the Registrar-General [C.—8503], 1897.)
The following are the death-rates per thousand per annum, and the
proportions over 65 years of age, of occupied males in general,
farmers, textile workers, and glass workers (over 15 years of age
in each case) during the decade 1891-1900 in England and Wales.

Proportion
Death-rate per thousand
per thousand. over 65 Years
of Age.
Occupied males over 15 15-8 46
Farmers ” 2 ; 196 132
Textile workers, males over 15. 159 34
Glass workers &gt; IL 06 16
Would farming, textile working, and glass working seem to be
relatively healthy or unhealthy occupations, given that the death-
rates among occupied males from 15-65 and over 65 years of age
are 11-5 and 102-3 per thousand respectively ?
If A denote death, B the given occupation, C old age, we have

NO
        <pb n="73" />
        IV.—PARTIAL ASSOCIATION. 3
to apply the principle of equation (7). Calculate what would be
the death-rate for each occupation on the supposition that the
death-rates for occupied males in general (11-5, 102-3) apply to
each of its separate age-groups (under 65, over 65), and see
whether the total death-rate so calculated exceeds or falls short
of the actual death-rate. If it exceeds the actual rate, the
occupation must on the whole be healthy; if it falls short, un-
healthy. Thus we have the following calculated death-rates :—

Farmers . . .. 115x868 +1023 x '132 = 23-5.
Textile workers . 115x-966+102'3 x ‘034 =14"6.
Glass workers . . 115x984 +102:3 x 016 =13-0.

The calculated rate for farmers largely exceeds the actual rate ;
farming, then, must on the whole, as one would expect, be
a healthy occupation. The death-rate for either young farmers
or old farmers, or both, must be less than for occupied males in
general (the last is actually the case); the high death-rate
observed is due solely to the large proportion of the aged. Textile
working; on the other hand, appears to be unhealthy (14:6 &lt;15°9),
and glass working still more so (13:0&lt;16°6) ; the actual low total
death-rates are due merely to low proportions of the aged.

It is evident that age-distributions vary so largely from one
occupation to another that total death-rates are liable to be very
misleading—so misleading, in fact, that they are not tabulated at all
by the Registrar-General ; only death-rates for narrow limits of age
(5b or 10 year age-classes) are worked out. Similar fallacies are
liable to occur in comparisons of local death-rates, owing to
variations not only in the relative proportions of the old, but also
in the relative proportions of the two sexes.

It is hardly necessary to observe that as age is a variable quantity,
the above procedure for calculating the comparative death-rates
is extremely rough. The death-rate of those engaged in any occu-
pation depends not only on the mere proportions over and under
65, but on the relative numbers at every single year of age. The
simpler procedure brings out, however, better than a more complex
one, the nature of the fallacy involved in assuming that crude death-
rates are measures of healthiness. [See also Chap. XI. §§ 17-19.]

Lzample iv.—Eye-colour in grandparent, parent and child.
(The figures are those of Example ii.)

4, light-eyed child ; B, light-eyed parent ; C, light-eyed grand-
parent.

&amp; =5008 (4B) =2524
2 = 3584 (40) = 2480
B)=3052 (BC) =2231
(C)=3178
        <pb n="74" />
        Co THEORY OF STATISTICS.

Given only the above data, investigate whether there is probably
a partial association between child and grandparent.

If there were no partial association we would have—

(4B)(BO) , (4B)(BO)
AC) ==
HET Rar
_ 2524 x 2231 1060 x 947
Ag ST
=18450+ 513-2
= 2358-2.
Actually (AC)=2480; there must, then, be partial association
either in the B-universe, the S-universe, or both. In the absence
of any reason to the contrary, it would be natural to suppose there
is a partial association in both; 7. that there is a partial
association with the grandparent whether the line of descent
passes through “light-eyed ” or “not-light-eyed ” parents, but this
could not be proved without a knowledge of the class-frequency
(480).

10. The total possible number of associations to be derived from
n attributes grows so rapidly with the value of » that the evalua-
tion of them all for any case in which =» is greater than four
becomes almost unmanageable. For three attributes there are 9
possible associations—three totals, three partials in positive
universes, and three partials in negative universes. For four
attributes, the number of possible associations rises to 64,
for there are 6 pairs to be formed from four attributes, and
we can find 9 associations for each pair (1 total, 4 partials
with the universe specified by one attribute, and 4 partials
with the universe specified by two). For five attributes the
student will find that there are no less than 270, and for six
attributes 1215 associations.

As suggested by Examples i. and ii. above, however, it is not
necessary in any actual case to investigate all the associations
that are theoretically possible ; the nature of the problem indicates
those that are required.

In Example i., for instance, the total and partial associations
between 4 and D were alone investigated ; the associations between
A and B, B and D were not essential for answering the question
that was asked. In Example ii., again, the three total associations
and the partial association between 4 and C' were worked out,
but the partial associations between 4 and B, B and C' were
omitted as unnecessary. Practical considerations of this kind will
always lessen the amount of necessary labour.

nA
        <pb n="75" />
        IV.—PARTIAL ASSOCIATION. 55

11. 1t might appear, at first sight, that theoretical considera-
tions would enable us to lessen it still further. As we saw in
Chapter I., all class-frequencies can be expressed in terms of those
of the positive classes, of which there are 2" in the case of n
attributes. For given values of the n+ 1 frequencies &amp;, (4), (B),
(C), . . . of order lower than the second, assigned values of the
positive class-frequencies of the second and higher orders must
therefore correspond to determinate values of all the possible
associations. But the number of these positive class-frequencies
of the second and higher orders is only 2 —n +1 ; therefore the
number of algebraically independent associations that can be
derived from = attributes is only 2"-m+1. For successive
values of n this gives—

n 2" —m 1]
;

Hence if we give data, in any form, that determine four
associations in the case of three attributes, eleven in the case of
four attributes, and so on, in addition to V and the class-frequencies
of the first order, we have done all that is theoretically necessary.
The remaining associations can be deduced.

12. Practically, however, the mere fact that they can be deduced
is of little help unless such deduction can be effected simply,
indeed almost directly, by mere mental arithmetic almost, and
this is not the case. The relations that exist between the ratios
or differences, such as (4B) — (4B),, that indicate the associations
are, in fact, so complex that an unknown association cannot be
determined from those that are given without more or less lengthy
work ; it is not possible to infer even its sign by any simple
process of inspection. We have, for instance, from (5), by the
process used in obtaining (4) for the special case of § 6—
| (427) - LC | (4B) - (4B) - (5 (140) - (4050) - BO

- (40)(BC)
| az iC |
which gives us the difference of (4By) from the value it would
have if 4 and B were independent in the universe of y’s in terms
of the difference of (ABC) from the value it would have if 4 and
        <pb n="76" />
        te THEORY OF STATISTICS.

B were independent in the universe of (’s, and the corresponding
differences for the frequencies (4B), (AC), and (BC). The four
quantities in the brackets on the right represent, say, the four
known associations, the bracket on the left the unknown association.
Clearly, the relation is not of such a simple kind that the term on
the left can be, in general, mentally evaluated. Hence in con-
sidering the choice and number of associations to be actually
tabulated, regard must be had to practical considerations rather
than to theoretical relations.

13. The particular case in which all the 2” —n +1 given associa-
tions are zero is worth some special investigation.

It follows, in the first place, that all other possible associations
must be zero, z.e. that a state of complete independence, as we
may term it, exists. Suppose, for instance, that we are given—

(4)(B) _(4)(©)
(4B) = 7 (40) = v
_(B)C) _(4OXBO) _ (4)(B)(C)
(BC) aN (4B0C) Tw (0) Ce
Then it follows at once that we have also—
AB)(BC) (4B)(AC)
A450) = EENBC) (ABYC,
ents a
t.e. 4 and C are independent in the universe of B’s, and B and C
in the universe of 4’s. Again,
= (AE) _ DB) _ (A)XB)C)
(4By)=(4B) - (4BC) = Sa
_ AB) _ 4y)(By)
re (r)
Therefore 4 and B are independent in the universe of fs.
Similarly, it may be shown that 4 and C' are independent in the
universe of 8’s, B and C in the universe of a’s.

In the next place it is evident from the above that relations of
the general form (to write the equation symmetrically)

“BO _(1) (#) (©) =

N x. aN : 2

must hold for every class-frequency. This relation is the general
form of the equation of independence, (2) (d), Chap. III. (p. 26).
14. It must be noted, however, that (8) is not a criterion for the

56
        <pb n="77" />
        IV.—PARTIAL ASSOCIATION. {
complete independence of 4, B, and C in the sense that the
equation

(4B) _(4) (B)
NTN NN
is a criterion for the complete independence of 4 and B. If we
are given JV, (4), and (B), and the last relation quoted holds
good, we know that similar relations must hold for (48), (aB),
and (a3). If &amp;, (4), (8B), and (C) be given, however, and the
equation (8) hold good, we can draw no conclusion without
further information ; the data are insufficient. There are eight
algebraically independent class-frequencies in the case of three
attributes, while , (4), (B), (C) are only four: the equation (8)
must therefore be shown to hold good for four frequencies of the
third order before the conclusion can be drawn that it holds good
for the remainder, 7.e. that a state of complete independence
subsists. The direct verification of this result is left for the
student.

Quite generally, if &amp;, (4), (B), (C), . . .. be given, the relation
{4BC ..., J _ (4) (8B) (©) (9)
A = 5% Fa :

must be shown to hold good for 2" —n +1 of the nth order classes

before it may be assumed to hold good for the remainder. It is
only because
2" —n+1=1
when n= 2 that the relation
4B) (4) (B)
¥ 5A:

may be treated as a criterion for the independence of 4 and B.

If all the n (n&gt;2) attributes are completely independent, the

relation (9) holds good ; but it does not follow that if the relation

(9) hold good they are all independent.

REFERENCES.

(1) Youre, G. U., “On the Association of Attributes in Statistics,” Phil.
Trans. Roy. Soc., Series A, vol. cxciv., 1900, p- 257. (Deals fully
with the theory of partial as well as of total association, with numerous
illustrations : a notation suggested for the partial coefficients.)

(2) YuLe, G. U., ‘““Notes on the Theory of Association of Attributes in
Statistics,” Biometrika, vol, ii., 1903, p. 121. (Cf. especially §§ 4 and
5, on the theory of complete independence, and the fallacies due to
mixing of records.)

LY
        <pb n="78" />
        THEORY OF STATISTICS.
EXERCISES.

1. Take the following figures for girls corresponding to those for boys in
Example i., p. 45, and discuss them similarly, but not necessarily using
exactly the same comparisons, to see whether the conclusion that * the
connecting link between defects of body and mental dulness is the coincident
defect of brain which may be known by observation of abnormal nerve signs”
seems to hold good.

A, development defects. 2B, nerve signs. 2), mental dulness

N 10,000 (4B) 248
(4) 682 (4D) 307
(B) 850 (BD) 363
(D) 689 (4 BD) 128

9. (Material from Census of England and Wales, 1891, vol, iii.) The
following figures give the numbers of those suffering from single or combined
infirmities : (1) for all males, (2) for males of 55 years of age and over.

4, Blindness. B, Mental derangement. €, Deaf-mutism.

(1) (2) (1) (2)

All Males. Males 55- All Males. Males 55-
N 14,053,000 1,377,000 (4B) 183 65
(4) 12,281 5,538 (40) 51 14
(B) 45,392 10,309 (BC) - 299 47
(0) 7,707 746 (4BC) 11 3

Tabulate proportions per thousand, exhibiting the total association between
blindness and mental derangement, and the partial association between the
same two infirmities among deaf-mutes, (1) for males in general, (2) for those
of 55 years of age or over. Give a short verbal statement of the results, and
contrast them with those of Question 1.

3. (Material from supplement to 55th Annual Report Reg.-Genl.)

The death-rate from cancer for occupied males in general (over 15) is
0°685 per thousand per annum, and for farmers 120.

The death-rates from cancer for occupied males under and over 45 respec-
tively are 0°13 and 2°25 respectively. Of the farmers 46-1 per cent. are over
45.

Would you say that farmers were peculiarly liable to cancer?

4. A population of males over 15 years of age consists of 7 per cent. over 65
years of age and 93 per cent. under. The death-rates are 12 per thousand per
annum in the younger class and 110 in the older, or 18-86 in the whole
population. The death-rate of males (over 15) engaged in a certain industry
is 267 per thousand.

If the industry be not unhealthy, what must be the approximate proportion
of those over 65 engaged in it (neglecting minor differences of age
distribution)

5. Show that if 4 and B are independent, while 4 and C, B and C are
associated, 4 and B must be disassociated either in the universe of C's,
the universe of v's, or both.

6. As an illustration of Question 5, show that if the following were actual
data, there would be a slight disassociation between the eye-colours of
husband and wife (father and mother) for the parents either of light-eyed
sons or not-light-eyed sons, or both, although there is a slight positive
association for parents at large.

58
        <pb n="79" />
        IV.—PARTIAL ASSOCIATION,

4 light-eye colour in husband, B in wife, C in son—

N 1000 (4B) 358
(4) 622 (40) 471
(B) 558 (BC) 419
(0) 617

7. Show that if (4BC)=(aBy), (aBC)=(4Bv), and so on (the case of
“complete equality of contrary frequencies” of Question 7, Chap. L), 4, B,
and C are completely independent if 4 and 2, 4 and C, B and C are inde-
pendent pair and pair.

8. If, in the same case of complete equality of contraries,

(4B)-N[4=38,
(4C) - N/4=35,
(BC) -N/1=38;
show that
(40) BC) (AY)(By) |_s _ 48:3;
2 (4 BOY — —f l= - =O =
so that the partial associations between 4 and 5 in the universes C and + are
positive or negative according as
43,8
52 -~

9. In the simple contests of a general election (contests in which one
Conservative opposed one Liberal and there were no other candidates) 66 per
cent. of the winning candidates (according to the returns) spent more money
than their opponents. Given that 63 per cent. of the winners were Con-
servatives, and that the Conservative expenditure exceeded the Liberal in 80
per cent. of the contests, find the percentages of elections won by Conservatives
(1) when they spent more and (2) when they spent less than their opponents,
and hence say whether you consider the above figures evidence of the influence
of expenditure on election results or no. (Note that if the one candidate in a
contest be a Conservative-winner-who spends more than his opponent—the
other must necessarily be a Liberal-loser-who spends less — and so forth.
Hence the case is one of complete equality of contraries.)

10. Given that (4)/N=(B)/N=(C)/N==, and that (4B)/N=(4C)/N=y,
find the major and minor limits to y that enable one to infer positive associa-
tion between B and C, 7.e. (BC)/N&gt;z?2.

Draw a diagram on squared paper to illustrate your answer, taking 2 and y
as co-ordinates, and shading the limits within which # must lie in order to
perwit of the above inference. Point out the peculiarities in the case of in-
ferring a positive association from two negative associations.

11. Discuss similarly the more complex case (4)/N=z, (B)/N=2z, (C)/N=
3x: —

(1) for inferring positive association between B and C given (4B)/N=
(4C)/N=y.

(2) for inferring positive association between 4 and C given (4B)/N=
(BC)/N=y.

(8) for inferring positive association between 4 and B given (4C)/N=
(BC)/N=y.

59
        <pb n="80" />
        CHAPTER V.
MANIFOLD CLASSIFICATION.

1. The general principle of a manifold classification—2-4. The table of
double-entry or contingency table and its treatment by fundamental
methods—5-8. The coefficient of contingency—9-10. Analysis of
a contingency table by tetrads—11-13. Isotropic and anisotropic
distributions—14-15. Homogeneity of the classifications dealt with
in this and the preceding chapters: heterogeneous classifications.

1. CrassiricaTiON by dichotomy is, as was briefly pointed out in

Chap. I § 5, a simpler form of classification than usually occurs

in the tabulation of practical statistics. It may be regarded as

a special case of a more general form in which the individuals or

objects observed are first divided under, say, s heads, 4; 4, . . ..

A, each of the classes so obtained then subdivided under ¢ heads,

B,, B,....B, each of these under heads, C,, Cy ..... . C,, and

so on, thus giving rise to s. ¢. . . . . . ultimate classes altogether.

2. The general theory of such a manifold as distinct from a
twofold or dichotomous classification, in the case of n attributes
or characters ABC .... XN, would be extremely complex: in the
present chapter the discussion will be confined to the case of two

characters, 4 and B, only. If the classification of the 4’s be s-

fold and of the B’s t-fold, the frequencies of the st classes of the

second order may be most simply given by forming a table with

s columns headed 4, to 4, and ¢ rows headed B; to B. The

number of the objects or individuals possessing any combination

of the two characters, say 4,, and B,, ¢.e. the frequency of the
class 4,,B,, is entered in the compartment common to the mth
column and the mth row, the st compartments thus giving all
the second-order frequencies. The totals at the ends of rows
and the feet of columns give the first-order frequencies, &lt;.e. the
numbers of 4,’s and B,’s, and finally the grand total at the
right-hand bottom corner gives the whole number of observations.

Tables I. and II. below will serve as illustrations of such tables

of double-entry or contingency tables, as they have been termed

by Professor Pearson (ref. 1).
60
        <pb n="81" />
        V.—MANIFOLD CLASSIFICATION. "1
3. In Table I. the division is 3 x 3-fold : the houses in England
and Wales are divided into those which are in (1) London, (2)
other urban districts, (3) rural districts, and the houses in each
of these divisions are again classified into (1) inhabited houses,
(2) uninhabited but completed houses, (3) houses that are
“building,” 7.e. in course of erection. Thus from the first row
we see that there were in London, in round numbers, 616,000
houses, of which 571,000 were inhabited, 40,000 uninhabited,
and 5000 in course of erection: from the first column, there
were 6,260,000 inhabited houses in England and Wales, of which
571,000 were in London, 4,064,000 in other urban districts, and
1,625,000 in rural districts.
TABLE I.— Houses in England and Wales. (Census of 1901.
Summary Table X.) (000’s omitted.)
Inhabited. ;UP™ Building. | Total
il habited, JRtAGINE. flotal,
Adm. County of London ; 571 40 5 616
Other urban districts . : 4064 285 45 4394
Rural districts . . 1625 124 2 | 1761
Total for England and Wales 6260 149 6771
In Table II., on the other hand, the classification is 3 x 4-fold :
the eye-colours are classed under the three heads “blue,” “grey or
green,” and “brown,” while the hair-colours are classed under
four heads, “fair,” “brown,” “black,” and “red.” The table is
TasLE 1I.—Hair- and Eye-Colours of 6800 Males in Baden.
(Ammon, Zur Anthropologie der Badener,)
Hair-colour.
Eye-colour. - Total.
Fair. | Brown. | Black. | Red.
Blue : ; ; . 1768 807 189 47 2811
Grey or Green PI 95 1387 746 | 73 3132
Brown . . . » 115 438 288 if 857
el 2229 22320 3 : 300

0
wd 62
vot | 116 65
        <pb n="82" />
        THEORY OF STATISTICS.

read similarly to the last. Taking the first row, it tells us that
there were 2811 men with blue eyes noted, of whom 1768 had
fair hair, 807 brown hair, 189 black hair, and 47 red hair.
Similarly, from the first column, there were 2829 men with fair
hair, of whom 1768 had blue eyes, 946 grey or green eyes, and
115 brown eyes. The tables are a generalised form of the four-
fold (2 x 2-fold) tables in § 13, Chap. IIL

4. For the purpose of discussing the nature of the relation
between the A’s and the B’s, any such table may be treated on
the principles of the preceding chapters by reducing it in different
ways to 2 X 2-fold form. It then becomes possible to trace the
association between any one or morc of the A’s and any one or
more of the B’s, either in the universe at large or in universes
limited by the omission of one or more of the 4’s, of the B’s, or
of both. Taking Table I., for example, trace the association
between the erection of houses and the urban character of a
district. Adding together the first two rows—z.e. pooling London
and the other urban districts together—and similarly adding the
first two columns, so as to make no distinction between inhabited
and uninhabited houses as long as they are completed, we find—

Proportion of all houses which

are in course of erection in {s0soto-— 10 per thousand.
urban districts . . :

Proportion of all houses which

are in course of erection ot 12/1761= 7 ir
rural districts . : HN

There is therefore, as might be expected, a distinct positive
association, a larger proportion of houses being in course of
erection in urban than in rural districts.

If, as another illustration, it be desired to trace the association
between the ¢ uninhabitedness ” of houses and the urban character
of the district, the procedure will be rather different. Rows 1
and 2 may be added together as before, but column 3 may be
omitted altogether, as the houses which are only in course of
erection do not enter into the question. We then have—

Proportion of all houses which
are uninhabited in urban [faze 66 per thousand.
districts :
Proportion of all houses which}
are uninhabited in rural  124/1749=171 7°
districts (
The association is therefore negative, the proportion of houses
uninhabited being greater in rural than in urban districts.

62
        <pb n="83" />
        V.—MANIFOLD CLASSIFICATION. ’
The eye- and hair-colour data of Table II. may be treated in a
precisely similar fashion. If, e.g., we desire to trace the associa-
tion between a lack of pigmentation in eyes and in hair, rows 1
and 2 may be pooled together as representing the least pigmenta-
tion of the eyes, and columns 2, 3, and 4 may be pooled together
as representing hair with a more or less marked degree of
pigmentation. We then have—

Proportion of light-eyed with ) 2714/5943 = 46 per cent.

fair hair . J
Proportion of brown-eyed with 115/857 =13
fair hair . :
The association is therefore well-marked. For comparison we
may trace the corresponding association between the most marked
degree of pigmentation in eyes and hair, 7.e. brown eyes and
black hair. Here we must add together rows 1 and 2 as before,
and columns 1, 2, and 4—the column for red being really mis-
placed, as red represents a comparatively slight degree of pigmenta-
tion. The figures are—
roportion rown- v
p Spt of % ge ged Mth } 288/857 = 34 per cent.
Proportion of light-eyed with
mip pd }935/5943=16

The association is again positive and well-marked, but the
difference between the two percentages is rather less than in the
last case.

5. The mode of treatment adopted in the preceding section rests
on first principles, and, if fully carried out, it gives the most detailed
information possible with regard to the relations of the two attri-
butes. At the same time a distinct need is felt in practical work for
some more summary method—a method which will enable a single
and definite answer to be given to such a question as—Are the
4’s on the whole distinctly dependent on the B’s; and if so, is this
dependence very close, or the reverse? The subject of coefficients
of association, which affords the answer to this question in the
case of a dichotomous classification, was only dealt with briefly
and incidentally, for it is still the subject of some controversy :
further, where there are only four classes of the second order
to be considered the matter is not nearly so complex as where
the number is, say, twenty-five or more, and the need for
any summary coefficient is not so often nor so keenly felt. The
ideas on which Professor Pearson’s general measure of de-
pendence, the “coefficient of contingency,” is based, are, more-
over, quite simple and fundamental, and the mode of calculation

62
        <pb n="84" />
        THEORY OF STATISTICS.
is therefore given in full in the following section. The advanced
student should refer to the original memoir (ref. 1) for a completer
treatment of the theory of the coefficient, and of its relation to
the theory of variables.

6. Generalising slightly the notation of the preceding chapters,
let the frequency of 4,’s be denoted by (4,), the frequency of
Bs by (B,), and the frequency of objects or individuals possessing
both characters by (4,B,). Then, if the A’s and B’s be com-
pletely independent in the universe at large, we must have for all
values of m and n—

An Bn
(4,.B,) = ad =(4dnBn)y - : +35{1)
If, however, 4 and B are not completely independent, (4,.B,) and
(A,.B,), will not be identical for all values of m and n. Let
the difference be given by
On = (4B) TF (4B) - (2)

A coefficient such as we are seeking may evidently be based in
some way on these values of 8. It will not do, however, simply to
add them together, for the sum of all the values of d, some of
which are negative and others positive, must be zero in any case,
the sum of both the (4B)’s and the (4B),’s being equal to the
whole number of observations XV. It is necessary, therefore, to
get rid of the signs, and this may be done in two simple ways: (1)
by neglecting them and forming the arithmetical instead of the
algebraical sum of the differences 3, or (2) by squaring the differ-
ences and then summing the squares. The first process is the
shorter, but the second the better, as it leads to a coefficient
easily treated by algebraical methods, which the first process
does mot: as the student will see later, squaring is very
usefully and very frequently employed for the purpose of elimin-
ating algebraical signs. Suppose, then, that every 0 is calculated,
and also the ratio of its square to the corresponding value of
(4B), and that the sum of all such ratios is, say, x2; or, in
symbols, using 2 to denote “the sum of all quantities like ” :—

Pel 3

elie a 49

Being the sum of a series of squares, x2 is necessarily positive,

and if 4 and B be independent it is zero, because every 6 is zero.
If, then, we form a coefficient C' given by the relation

= x “A

# VF + x2 )

64
or
        <pb n="85" />
        V.—MANIFOLD CLASSIFICATION.
this coefficient is zero if the characters 4 and B are completely
independent, and approaches more and more nearly towards
unity as x? increases. In general, no sign should be attached
to the root, for the coefficient simply shows whether the two
characters are or are not independent, and nothing more, but in
some cases a conventional sign may be used. Thus in Table II.
slight pigmentation of eyes and of hair appear to go together,
and the contingency may be regarded as definitely positive. If
slight pigmentation of eyes had been associated with marked
pigmentation of hair, the contingency might have been regarded
as negative. (is Professor Pearson’s mean square contingency
coefficient!

7. The coefficient, in the simple form (4), has one disadvantage,
viz. that coeflicients calculated on different systems of classi-
fication are not comparable with each other. It is clearly desir-
able for practical purposes that two coefficients calculated from
the same data classified in two different ways should be, at least
approximately, identical. With the present coefficient this is not
the case: if certain data be classified in, say, (1) 6x 6-fold, (2)
3 x 3-fold form, the coefficient in the latter form tends to be the
least. The greatest possible value of the coefficient is, in fact,
only unity if the number of classes be infinitely great; for any
finite number of classes the limiting value of € is the smaller the
smaller the number of classes. This may be briefly illustrated as
follows. Replacing §,,, in equation (3) by its value in terms of
(dnB,) and (4,,B,), we have—

4,B,.)
ox {UBS ,
LU;
and therefore, denoting the expression in brackets by S,
S-N
0-5" ©
Now suppose we have to deal with a ¢x #fold classification in
which (4,,) = (B,,) for all values of m; and suppose, further, that
the association between 4,, and B,, is perfect, so that (4,,B,)=
(4) = (B,,) for all values of m, the remaining frequencies of the
second order being zero; all the frequency is then concentrated
in the diagonal compartments of the table, and each contributes
1 Professor Pearson (ref. 1) terms 3a sub-contingency ; x2 the square contin-
gency ; the ratio x%/N, which he denotes by ¢2, the mean square contingency ;
and the sum of all the &amp;’s of one sign only, on which a different coefficient can
be based. the mean contingency.

65
(5,
A
        <pb n="86" />
        THEORY OF STATISTICS.
AX to the sum S. The total value of S is accordingly ¢V, and the
value of C—
t—1
o=y/ So
This is the greatest possible value of C' for a symmetrical ¢ x z-fold
classification, and therefore, in such a table, for—
- 7 “7 cannot exceed 0-707
: 316
: +366
. 0894
0913
1-926
D035
2943
t= 10 »y 0949
It is as well, therefore, to restrict the use of the “coefficient of
contingency ” to 5 x 5-fold or finer classifications. At the same
time the classification must not be made too fine, or else the value
of the coefficient is largely affected by casual irregularities of no
physical significance in the class-frequencies (cf. the remarks: in
Chap. III. §§ 7-8).
TasLE III. —Independence- Values of the Frequencies for Table IT.
Eye-colour. Fair. | Brown. Black. | Red.
Blue. | 2a or a SL TOM 1028 506 | 48°0
Grey or Green . ; . . - IR1303 | 1212 | 563 534
Brown . 2 v : I. 357 332 154 | 14+6
8. As the classification of Table II. is only 3 x 4-fold, it is rather
crude for the purpose of calculating the coefficient, but will serve
simply as an illustration of the form of the arithmetic. In Table
ITI. are given the values of the independence frequencies, 2829 x
2811/6800=1169 and so on. The value of x2 is more readily
calculated from equation (5) than from (3) :—

66
        <pb n="87" />
        V.—MANIFOLD CLASSIFICATION,
(1768)2/1169 26739
(946)%/1303 686-8
(115)2%/357 37-0
(807)2/1088 5986
(1387)%/1212 1587-3
(438)%/332 577-8
(189)2/506 706
(746)%/563 988-5
(288)%/154 5386
(47)%/48°0 46-0
53)%/534 526
16)2/14-6 17-5
Total= = 78752

I. 6800

-—- = 1075-2

i 10752 EEE (1.

The squares in such work may conveniently be taken from
Barlow’s Zables of Squares, Cubes, etc. (see list of tables on
P- iy or opi a9 be used throughout—five figure
ogarithms are quite sufficient.

9. While such a coefficient of contingency, in some form or
other, is a great convenience in many fields of work, its use
should not lead to a neglect of those details which a treatment by
the elementary methods of § 4 would have revealed. Whether
the coefficient be calculated or no, every table should always be
examined with care to see if it exhibit any apparently significant
peculiarities in the distribution of frequency, e.g. in the associa-
tions subsisting between 4,, and B, in limited universes. A good
deal of caution must be used in order not to be misled by casual
irregularities due to paucity of observations in some compartments
of the table, but important points that would otherwise be over-
looked will often be revealed by such a detailed examination.

10. Suppose, for example, that any four adjacent frequencies,
say—

(4,.B.,) (Ams Bn)

I. (441Bns1)
are extracted from the general contingency table. Considering
these as a table exhibiting the association between 4, and 2B, in
a universe limited to. 4.4.41 BoB alone, the association is
positive, negative, or zero according as (4,,8,)/(4d,,+1B,) is greater

67
        <pb n="88" />
        THEORY OF STATISTICS.
than, less than, or equal to the ratio (8, (Anyi Dae). The
whole of the contingency table can be analysed into a series of
elementary groups of four frequencies like the above, each one
overlapping its neighbours so that an rsfold table contains
(r—1) (s—1) such “tetrads,” and the associations in them all can
be very quickly determined by simply tabulating the ratios like
(428,) (Ani 8.) (4,801) (4dnsi Bor), ete., or perhaps better,
the proportions (4.,.8,)/{ (AnB,) + (4118), ete., for every pair
of columns or of rows, as may be most convenient. Taking the
figures of Table II. as an illustration, and working from the
rows, the proportions run as follows :—
For rows 1 and 2. For rows 2 and 3.

1768/2714 0-651 946/1061 0-892

807/2194 0-368 1387/1825 0-760

189/935 0-202 746/1034 0-721

47/100 0-470 53/69 0-768

In both cases the first three ratios form descending series, but
the fourth ratio is greater than the second. The signs of the
associations in the six tetrads are accordingly—
The negative sign in the two tetrads on the right is striking,
the more so as other tables for hair- and eye-colour, arranged in
the same way, exhibit just the same characteristic. But the
peculiarity will be removed at once if the fourth column be placed
immediately after the first : if this be done, 7.e. if “red ” be placed
between “fair” and “brown ” instead of at the end of the colour-
series, the sign of the association in all the elementary tetrads
will be the same. The colours will then run fair, red, brown,
black, and this would seem to be the more natural order, consider-
ing the depth of the pigmentation.

11. A distribution of frequency of such a kind that the
association in every elementary tetrad is of the same sign
possesses several useful and interesting properties, as shown in
the following theorems. It will be termed an isotropic dis-
tribution.

(1) In an isotropic distribution the sign of the association is
the same not only for every elementary tetrad of adjacent Jrequen-
cies, but for every set of four frequencies in the compartments
common to two rows and two columns, e.g. (4,B,), (dn.,B,)
(4dnBrio) (drei pBrig):

68
        <pb n="89" />
        V.—MANIFOLD CLASSIFICATION. :

For suppose that the sign of association in the elementary
tetrads is positive, so that—

(4 wh ”) (Amir Brtr) &gt; (dn B nD (4,.B nt1) ! (1)
and similarly,
(An1Bo)(Ami2Busr) &gt; (Amie Bu) (Amir Busr) / i
Then multiplying up and cancelling we have
(An Bp)(Ami2Bus1) &gt; (Ams Br) (AmB) ” : (3)
That is to say, the association is still positive though the two
columns 4,, and 4,,,, are no longer adjacent.

(2) An wsotropic distribution remains isotropic in whatever way
ut may be condensed by grouping together adjacent rows or columns.

Thus from (1) and (3) we have, adding—

(4B) [(dns1Brsr) HE (Ams2Bns1)] &gt; (AnBri)[(Adms1 Br) ot (4ms2Bn));
that is to say, the sign of the elementary association is unaffected
by throwing the (m+ 1)th and (m+ 2)th columns into one.

(3) As the extreme case of the preceding theorem, we may
suppose both rows and columns grouped and regrouped until
only a 2 x 2-fold table is left ; we then have the theorem—

If an isotropic distribution be reduced to a fourfold distribution
wn any way whatever, by addition of adjacent rows and columns,
the sign of the association in such fourfold table is the same as in
the elementary tetrads of the original table.

The case of complete independence is a special case of isotropy.
For if

(AnBo) = (An) (BLN
for all values of m and =, the association is evidently zero for
every tetrad. Therefore the distribution remains independent
in whatever way the table be grouped, or in whatever way the
universe be limited by the omission of rows or columns. The
expression ‘complete independence ” is therefore justified.

From the work of the preceding section we may say that Table
IL. is not isotropic as it stands, but may be regarded as a dis-
arrangement of an isotropic distribution. It is best to rearrange
such a table in isotropic order, as otherwise different reductions
to fourfold form may lead to associations of different sign, though
of course they need not necessarily do so.

12. The following will serve as an illustration of a table that
is not isotropic, and cannot be rendered isotropic by any rearrange-
ment of the order of rows and columns.

69
oa
        <pb n="90" />
        THEORY OF STATISTICS.
TABLE IV.—Showing the Frequencies of Different Combinations of
Hye-colowurs in Father and Son.
(Data of Sir F. Galton, from Karl Pearson, Phil. Trams., A, vol. cxcv.
(1900), p. 188 ; classification condensed.)
1. Blue. 2. Blue-green, grey. 3. Dark grey, hazel. 4. Brown.
FATHER’S EYE-COLOUR.
: : Total.
194 70 41 30 335
: 83 |, 124 a | sos
=. 95 | ot 55 23 137
=? i “3 43 109 244
Total 358 264 20 198 1000
The following are the ratios of the frequency in column m to
the sum of the frequencies in columns m and m + 1 :—
CoLumMNs
1 and 2. 2 and 3. 3 and 4.
0-735 0-631 0-577
0-401 0-752 0-532
0-424 0-382 0-705
0-609 0-456 0-283

The order in which the ratios run is different for each pair of
columns, and it is accordingly impossible to make the table
isotropic. The distribution of signs of association in the several
tetrads is—

The distribution is a curious one, the associations in tetrads
round the diagonal of the whole table being so markedly positive
and those in the immediately adjacent tetrads equally markedly
negative. Neglecting the other signs, this is the effect that
would be produced by taking an isotropic distribution and then
increasing the frequencies in the diagonal compartments by a
sufficient percentage. Comparison of the given table with others
from the same source shows that the peculiarity is common to

70
Hi
        <pb n="91" />
        V.—MANIFOLD CLASSIFICATION. 71
the great majority of the tables, and accordingly its origin
demands explanation. Were such a table treated by the method
of the contingency coefficient, or a similar summary method,
alone, the peculiarity might not be remarked. .

13. It may be noted, in concluding this part of the subject,
that in the case of complete independence the distribution of
frequency in every row is similar to the distribution in the row
of totals, and the distribution in every column similar to that in
the column of totals ; for in, say, the column 4, the frequencies
are given by the relations —

4, 4, 4,
(4,8) = 2B), (4,8) = C42) B,), (4.8) = By,
and so on. This property is of special importance in the theory
of variables.

14. The classifications both of thissand of the preceding chapters
have one important characteristic in common, viz. that they
are, so to speak, “homogeneous”—the principle of division
being the same for all the sub-classes of any one class. Thus
A’s and o’s are both subdivided into B’s and f’s, 4,’s, 4s. . ..
A/s into Bs, By’s . ... Bs, and so on. Clearly this is necessary
in order to render possible those comparisons on which the
discussions of associations and contingencies depend. If we
only know that amongst the 4’s there is a certain percentage
of B's, and amongst the a’s a certain percentage of (C’s, there
are no data for any conclusion.

Many classifications are, however, essentially of a heterogeneous
character, e.g. biological classifications into orders, genera, and
species; the classifications of the causes of death in vital
statistics, and of occupations in the census. To take the last
case as an illustration, the first “order” in the list of occupations
is “General or Local Government of the Country,” subdivided
under the headings (1) National Government, (2) Local Govern-
ment. The next order is “Defence of the Country,” with the sub-
headings (1) Army, (2) Navy and Marines—not (1) National
and (2) Local Government again—the sub-heads are necessarily
distinct. Similarly, the third order is “Professional Occupations
and their Subordinate Services,” with the fresh sub-heads (1)
Clerical, (2) Legal, (3) Medical, (4) Teaching, (5) Literary and
Scientific, (6) Engineers and Surveyors, (7) Art, Music, Drama,
(8) Exhibitions, Games, etc. The number of sub-heads under
each main heading is, in such a case, arbitrary and variable,
and different for each main heading; but so long as the
classification remains purely heterogeneous, however complex
        <pb n="92" />
        ; THEORY OF STATISTICS.

it may become, there is no opportunity for any discussion
of causation within the limits of the matter so derived. It is
only when a homogeneous division is in some way introduced
that we can begin to speak of associations and contingencies.

15. This may be done in various ways according to the
nature of the case. Thus the relative frequencies of different
botanical families, genera, or species may be discussed in
connection with the topographical characters of their habitats—
desert, marsh, or moor—and we may observe statistical associa-
tions between given genera and situations of a given topographical
type. The causes of death may be classified according to sex,
or age, or occupation, and it then becomes possible to discuss
the association of a given cause of death with one or other
of the two sexes, with a given age-group, or with a given
occupation. Again, the classifications of deaths and of occupations
are repeated at successive intervals of time; and if they have
remained strictly the same, it is also possible to discuss the
association of a given occupation or a given cause of death with
the earlier or later year of observation—i.c. to see whether the
numbers of those engaged in the given occupation or succumbing
to the given cause of death have increased or decreased. But
in such circumstances the greatest care must be taken to see
that the necessary condition as to the identity of the classifications
at the two periods is fulfilled, and unfortunately it very
seldom is fulfilled. All practical schemes of classification are
subject to alteration and improvement from time to time, and
these alterations, however desirable in themselves, render a
certain number of comparisons impossible. Even where a
classification has remained verbally the same, it is not necessarily
really the same; thus, in the case of the causes of death,
improved methods of diagnosis may transfer many deaths ‘from
one heading to another without any change in the incidence
of the disease, and so bring about a virtual change in the
classification. In any case, heterogeneous classification should
be regarded only as a partial process, incomplete until a
homogeneous division is introduced either directly or indirectly,
e.g. by repetition.

REFERENCES.
Contingency.

(1) PEARSON, KARL, ‘On the Theory of Contingency and its Relation to
Association and Normal Correlation,” Drapers’ Company Research
Memoirs, Biometric Series i.; Dulau &amp; Co, London, 1904. (The
memoir in which the coefficient of contingency is proposed.)

72
        <pb n="93" />
        V.—MANIFOLD CLASSIFICATION. 73

(2) Lrers, G. F., ‘““Die Bestimmung der Abhingigkeit zwischen den
Merkmalen eines Gegenstandes,” Berichte der math.-phys. Klasse der
kgl. Sachsischen Gesellschaft der Wissenschaften ; Leipzig, 1905. (A
general discussion of the problems of association and contingency.)

(3) PEARSON, KARL, “On a Coefficient of Class Heterogeneity or Divergence,”
Biometrika, vol. v. p. 198, 1906. (An application of the contingency
coefficient to the measurement of heterogeneity, e.g. in different
districts of a country, by treating the observed frequencies of some
quality A,, A, . ... A, in the different districts as rows of a con-
tingency table and working out the coefficient: the same principle is
also applicable to the comparison of a single district with the rest of
the country.)

Isotropy.

(4) Youre, G. U., “On a Property which holds good for all Groupings of a
Normal Distribution of Frequency for Two Variables, with applications
to the Study of Contingency Tables for the Inheritance of Unmeasured
Qualities,” Proc. Roy. Soc., Series A, vol. lxxvii., 1906, p. 324. (On
the property of isotropy and some applications.)

(6) YuLe, G. U., “On the Influence of Bias and of Personal Equation in
Statistics of Ill-defined Qualities,” Jour. Anthrop. Inst., vol. Xxxvi.,
1906, p. 325. (Includes an investigation as to the influence of bias
and of personal equation in creating divergences from isotropy in
contingency tables.)

Contingency Tables of two Rows only.

(6) PEARSON, KARL, “On a New Method of Determining Correlation between
a Measured Character 4 and a Character B of which only the Percentage
of Cases wherein B exceeds (or falls short of) a given Intensity is recorded
for each Grade of 4,” Biometrika, vol. vii., 1909, p. 96. (Deals with a
measure of dependence for a common type of table, e.g. a table showing
the numbers of candidates who passed or failed at an examination, for
each year of age. The table of such a type stands between the con-
tingency tables for unmeasured characters and the correlation table
(chap. 1x.) for variables. Pearson’s method is based on that adopted
for the correlation table, and assumes a normal distribution of fre-
quency (chap. xv.) for B.)

(7) PearsoN, KARL, “On a New Method of Determining Correlation, when
one Variable is given by Alternative and the other by Multiple
Categories,” Biometrika, vol. vii., 1910, p. 248. (The similar
problem for the case in which the variable is replaced by an un-
measured quality.)

EXERCISES.

(1) (Data from Karl Pearson, ‘“ On the Inheritance of the Mental and Moral
Characters in Man,” Jour. of the Anthrop. Inst., vol. xxxiil., and Biometrika,
vol. iii.) Find the coefficient of contingency (coefficient of mean square
contingency) for the two tables below, showing the resemblance between
brothers for athletic capacity and between sisters for temper. Show that
neither table is even remotely isotropic. {As stated in § 7, the coefficient of
contingency should not as a rule be used for tables smaller than 5 x 5-fold :
these small tables are given to illustrate the method, while avoiding lengthy
arithmetic.)

“
=
        <pb n="94" />
        THEORY OF STATISTICS.
A. ATHLETIC CAPACITY.
First Brother.
x : Non-
Athletic. | Betwixt. ico Total.
Ha A
Athletic o . . 906 20 140 1066
-2 Betwixt . v v 20 76 9 105
_ Non-athletic . . 5 140 9 370 519
: Total 1 ) 290
B. TEMPER.
First Sister.
; Good-
Quick. Darel Sullen. Total.
onic TE 77 77 452
iz Good-natured : . 177 996 165 ' 1338
zi Sullen : ; g 77 165 120 362
ws — oC
Total is 2 2152

74
oe 106¢ 105 51¢ 16
452 135: 36 -
        <pb n="95" />
        PART IL—THE THEORY OF VARIABLES.
CHAPTER VI.
THE FREQUENCY-DISTRIBUTION.

1. Introduetory—2. Necessity for classification of observations: the frequency
distribution—3. Illustrations—4. Method of forming the table—5.
Magnitude of class-interval—6. Position of intervals—7. Process of
classification—8. Treatment of intermediate observations—9. Tabula-
tion—10. Tables with unequal intervals—11. Graphical representa-
tion of the frequency-distribution—12. Ideal frequency-distributions
—13. The symmetrical distribution—14. The moderately asymmetri-
cal distribution—15. The extremely asymmetrical or J-shaped dis-
tribution—16. The U-shaped distribution.

1. TeE methods described in Chaps. I.-V. are applicable to all

observations, whether qualitative or quantitative ; we have now

to proceed to the consideration of specialised processes, definitely
adapted to the treatment of quantitative measurements, but not
as a rule available (with some important exceptions, as suggested
by Chap: I. § 2) for the discussion of purely qualitative observa-
tions. Since numerical measurement is applied only in the case
of a quantity that can present more than one numerical value,
that is, a varying quantity, or more shortly a variable, this section
of the work may be termed the theory of variables. As common
examples of such variables that are subject to statistical treat-
ment may be cited birth- or death-rates, prices, wages, barometer

readings, rainfall records, and measurements or enumerations (e.g.

of glands, spines, or petals) on animals or plants.

2. If some hundreds or thousands of values of a variable have
been noted merely in the arbitrary order in which they happened
to occur, the mind cannot properly grasp the significance of the
record : the observations must be ranked or classified in some
way before the characteristics of the series can be comprehended,
and those comparisons, on which arguments as to causation
depend, can be made with other series. The dichotomous classi-
75
        <pb n="96" />
        THEORY OF STATISTICS.

fication, considered in Chaps. I.-IV., is too crude: if the values are
merely classified as A’s or o's according as they exceed or fall
short of some fixed value, a large part of the information given
by the original record is lost. A manifold classification, however
(¢f- Chap. V.), avoids the crudity of the dichotomous form, since
the classes may be made as numerous as we please, and numerical
measurements lend themselves with peculiar readiness to a
manifold classification, for the class limits can be conveniently
and precisely defined by assigned values of the variable. For
convenience, the values of the variable chosen to define the
successive classes should be equidistant, so that the numbers of
observations in the different classes (the class-frequencies) may be
comparable. Thus for measurements of stature the interval
chosen for classifying (the class-interval, as it may be termed)
might be 1 inch, or 2 centimetres, the numbers of individuals
being counted whose statures fall within each successive inch, or
each successive 2 centimetres, of the scale; returns of birth- or
death-rates might be grouped to the nearest unit per thousand
of the population; returns of wages might be classified to the
nearest shilling, or, if desired to obtain a more condensed table,
by intervals of five shillings or ten shillings, and so on. When
the variation is discontinuous, as for example in enumerations
of numbers of children in families or of petals on flowers, the
unit is naturally taken as the class-interval unless the range of
variation is very great. The manner in which the observations
are distributed over the successive equal intervals of the scale is
spoken of as the frequency-distribution of the variable.

3. A few illustrations will make clearer the nature of such
frequency-distributions, and the service which they render in
summarising a long and complex record :—

(a) Table I. In this illustration the mean annual death-rates,
expressed as proportions per thousand of the population per
annum, of the 632 registration districts of England and Wales,
for the decade 1881-90, have been classified to the nearest unit ;
t.e. the numbers of districts have been counted in which the
death-rate was over 12'5 but under 13:5, over 13'5 but under
14'5, and so on. The frequency-distribution is shown by the
following table.

[TaBLE I.

76
        <pb n="97" />
        VIL.—THE FREQUENCY-DISTRIBUTION. "1
TABLE I. —Showing the Numbers of Registration Districts in England and
Wales with Different mean Death-rates per Thousand of the Population
per Annum for the Ten Years 1881-90. (Material from the Supplement
to the 55th Annual Report of the Registrar-Gemeral for England and
Wales [C.—7769] 1895.)
Number of Number of
Mean Arvinal Districts with | Mean Annual Districts with
Death-rate Death-rate Death-rate Deatheraty
“UY between Limite "between Limits’
stated. stated.
125-135 5 235-245
135-145 16 245-255
14'5-155 Al 25°5-26'5
15°5-16'5 ‘ 26-5-27"5
165-175 los 275-285
17'5-18'5 Iv 285-295
185-195 6 29°5-30'5
19-5-20°5 4, 305-315
205-215 20 315-325
2115-225 1. 325-335
22-5-235
Total

Whilst a glance through the original returns fails to convey
any very definite impression, owing to the large and erratic
differences between the death-rates in successive districts, a brief
inspection of the above table brings out a number of important
points. Thus we see that the death-rates range, in round
numbers, from 13 to 33 per thousand per annum, but in the
great majority of districts lie nearer the lower limit than the
upper ; that the death-rates in some 60 per cent. of the districts
lie within the narrow limits 155 to 18-5, the rates being most
frequent near 17 per thousand, and so forth.

(6) Table II. The ages at death, in years, of the married
women in certain Quaker families were recorded and classified in
5-year groups according as they were over 17'5 but under 22-5,
over 22'5 but under 27-5, and so on. The frequency-distribution
was as follows :(—

[TasLE II.

7
        <pb n="98" />
        7" THEORY OF STATISTICS.

TasLe IL —Showing the Numbers of Married Women, in certain Quaker
Families, Dying at Different Ages. (Cited from Proc. Loy. Soc., vol. 1xvii.
(1900), p. 172. On the Correlation between Duration of Life and Number
of Offspring, by Miss M. Beeton, Karl Pearson, and G. U. Yule.)

Number of Number of
; 4 !
Age at Death, Poms Dying Age at Death, i Pi Dying
Yenrd etween onTs etween
: said Years : said Years
of Age. of Age.
175-225 29 62°5— 67°5 73
225-275 87 67°5- 725 83
27'5-32°5 99 72'5- 77'5 77
325-375 109 77'5- 825 78
37:5-42°5 90 82:5— 875 59
425-475 87 87:5—- 925 26
47:5-52°5 64 92:5- 97°5 7
525-575 54 97°5-102°5
575-625 69 -
Total -295

The distribution is somewhat more irregular than in the last
case; the commencement is abrupt; a maximum frequency is
attained in the fourth class (age at death 325 to 37°), and then
there is a slow fall to the age-class 525-575. After this class
the frequency rises again and attains a secondary maximum in
the age-class 67°5-72°5.

(c) Table III. The numbers of stigmatic rays on a number
of Shirley poppies were counted. As the range of variation is
not great, the unit is taken as the class-interval. The frequency-
distribution is given by the following table.

TaBLE III. —Showing the Frequencies of Seed Capsules on certain Shirley
Poppies, with Different Numbers of Stigmatic Rays. (Cited from
Biometrika, ii. p. 89, 1902.)

Number of Number of
Number of Capsules Number of Capsules
Stigmatic with said Stigmatic with said
Rays. Number of Rays. Number of
Stigmatic Rays’ Stigmatic Rays.
5 3 14 302
{ 11 15 234
| B=: 16 128
106 17 50
10 | 152 18 19
i 238 19 }
Ar 305 20
13 315
Total 5

8
100
i
190F
        <pb n="99" />
        VL—THE FREQUENCY-DISTRIBUTION. 7)

The numbers of rays range from 6 to 20,—12, 13, or 14 rays
being the most usual.

4. To expand slightly the brief description given in § 2, tables
like the preceding are formed in the following way :—(1) The
magnitude of the class-interval, 7.e. the number of units to each
interval, is first fixed ; one unit was chosen in the case of Tables
I. and IIL, five units in the case of Table II. (2) The position or
origin of the intervals must then be determined, e.g. in Table I.
we must decide whether to take as intervals 12-13, 13-14, 14-15,
ete, or 125-135, 13-5-14'5, 14'5-15'5, ete. (3) This choice
having been made, the complete scale of intervals is fixed, and the
observations are classified accordingly. (4) The process of
classification being finished, a table is drawn up on the general
lines of Tables I.-III., showing the total numbers of observations
in each class-interval. Some remarks may be made on each of
these heads.

5. Magnitude of Class-Interval.—As already remarked, in cases
where the variation proceeds by discrete steps of considerable
magnitude as compared with the range of variation, there is very
little choice as regards the magnitude of the class-interval. The
unit will in general have to serve. But if the variation be con-
tinuous, or at least take place by discrete steps which are small
in comparison with the whole range of variation, there is no such
natural class-interval, and its choice is a matter for judgment.

The two conditions which guide the choice are these: (a) we
desire to be able to treat all the values assigned to any one class,
without serious error, as if they were equal to the mid-value
of the class-interval, e.g. as if the death-rate of every district in
the first class of Table I. were exactly 13-0, the death-rate of
every district in the second class 140, and so on; (2) for con-
venience and brevity we desire to make the interval as large as
possible, subject to the first condition. These conditions will
generally be fulfilled if the interval be so chosen that the whole
number of classes lies between 15 and 25. A number of classes
less than, say, ten leads in general to very appreciable inaccuracy,
and a number over, say, thirty makes a somewhat unwieldy
table. A preliminary inspection of the record should accordingly
be made and the highest and lowest values be picked out.
Dividing the difference between these by, say, five and twenty, we
have an approximate value for the interval. The actual value
should be the nearest integer or simple fraction.

6. Position of Intervals.—The position or starting-point of the
intervals is, as a rule, more or less indifferent, but in general it
is fixed either so that the limits of intervals are integers, or, as in
Tables I. and II., so that the mid-values are integers. It may,

AE)
        <pb n="100" />
        THEORY OF STATISTICS.

however, be chosen, for simplicity in classification, so that no
limit corresponds exactly to any recorded value (cf. § 8 below). In
some exceptional cases, moreover, the observations exhibit a marked
clustering round certain values, e.g. tens, or tens and fives. This
is generally the case, for instance, in age returns, owing to the
tendency to state a round number where the true age is unknown.
Under such circumstances, the values round which there is a
marked tendency to cluster should preferably be made mid-values
of intervals, in order to avoid sensible error in the assumption that
the mid-value is approximately representative of the values in the
class. Thus, in the case of ages, since the clustering is chiefly round
tens, ¢ 25 and under 35,” “35 and under 45,” etc., the classification
of the English census, is a better grouping than ¢ 20 and under
30,” «30 and under 40,” and so on (cf. the Census of England and
Wales, 1911, vol. vii., and also ref. 5, in which a different view is
taken). When there is any probability of a clustering of this kind
occurring, it is as well to subject the raw material to a close
examination before finally fixing the classification.

1. Classification.—The scale of intervals having been fixed, the
observations may be classified. If the number of observations is
not large, it will be sufficient to mark the limits of successive
intervals in a column down the left-hand side of a sheet of paper,
and transfer the entries of the original record to this sheet by
marking a 1 on the line corresponding to any class for each entry
assigned thereto. It saves time in subsequent totalling if each
fifth entry in a class is marked by a diagonal across the preceding
four, or by leaving a space.

The disadvantage in this process is that it offers no facilities for
checking: if a repetition of the classification leads to a different
result, there is no means of tracing the error. If the number of
observations is at all considerable and accuracy is essential, it is
accordingly better to enter the values observed on cards, one to
each observation. These are then dealt out into packs according
to their classes, and the whole work checked by running through
the pack corresponding to each class, and verifying that no cards
have been wrongly sorted.

8. In some cases difficulties may arise in classifying, owing to
the occurrence of observed values corresponding to class-limits.
Thus, in compiling Table I., some districts will have been noted
with death-rates entered in the Registrar-General’s returns as
16-5, 175, or 185, any one of which might at first sight have
been apparently assigned indifferently to either of two adjacent
classes. In such a case, however, where the original figures for
numbers of deaths and population are available, the difficulty may
be readily surmounted by working out the rate to another place

20
        <pb n="101" />
        VI.—THE FREQUENCY-DISTRIBUTION. :
of decimals: if the rate stated to be 16:50 proves to be 16-502, it
will be sorted to the class 16:5-17°5; if 16498, to the class
155-165. Death-rates that work out to half-units exactly do
not occur in this example, and so there is no real difficulty. In
the case of Table II., again, there is no difficulty : if the year of
birth and death alone are given, the age at death is only calcul-
able to the nearest unit; if the actual day of birth and death be
cited, half-years still cannot occur in the age at death, because
there is an odd number of days in the year. The difficulty may
always be avoided if it be borne in mind in fixing the limits
to class-intervals, these being carried to a further place of decimals,
or a smaller fraction, than the values in the original record. Thus
if statures are measured to the nearest centimetre, the class-
intervals may be taken as 1505-1515, 151'5-152°5, ete. ; if to
the nearest eighth of an inch, the intervals may be 5915-6013,
6015-6118, and so on.

If the difficulty is not evaded in any of these ways, it is
usual to assign one-half of an intermediate observation to each
adjacent class, with the result that half-units occur in the
class-frequencies (¢f. Tables VIL, p. 90, X,, p. 96, and XI.,
p. 96). The procedure is rough, but probably good enough for
practical purposes ; strict precision is usually unattainable, for in
point of fact the odd way in which different individuals read a
scale (¢/. Supplement I.) renders it impossible to assign exact
limits to intervals.

9. Tabulation.—As regards the actual drafting of the final
table, there is little to be said, except that care should be taken
to express the class-limits clearly, and, if necessary, to state the
manner in which the difficulty of intermediate values has been
met or evaded. The class-limits are perhaps best given as in
Tables I. and II, but may be more briefly indicated by the mid-
values of the class-intervals. Thus Table I. might have been
given in the form—

Death-rate per 1000 Number of

per annum to the Districts with
Nearest Unit. said Death-rate,

13 5

14 16

15 61

15 112

eLc. ete.
A common mode of defining the class-intervals is to state the
limits in the form “az and less than .” In the case of measure-
ments of stature, for example, the table micht run—

81
a
        <pb n="102" />
        THEORY OF STATISTICS.
Stature in Inches. wo FE
57 and less than 58 2
58 v Dy 4
59 3st +00 14
etc. etc.
—the statement 57 and less than 58,” ete., being often abbreviated
to 57—, 58, 59, etc. (¢f. Table VI, p. 88). The mode of grouping
is, in effect, that described in the last paragraph as of service in
avoiding intermediate observations, but it should be noted that the
form of statement leaves the class-limits uncertain unless the degree
of accuracy of the measurements is also given. Thus, if measure-
ments were taken to the nearest eighth of an inch, the class-
limits are really 5615-5715, 5712-5812 ete.; if they were
only taken to the nearest quarter of an inch, the limits are 56
57%, 57i-587, ete. With such a form of tabulation a state-
ment as to the number of significant figures in the original
record is therefore essential. It is better, perhaps, to state the
true class-limits and avoid ambiguity.

10. The rule that class-intervals should be all equal is one
that is very frequently broken in official statistical publications,
principally in order to condense an otherwise unwieldy table,
thus not only saving space in printing but also considerable
expense in compilation, or possibly, in the case of confidential
figures, to avoid giving a class which would contain only one or
two observations, the identity of which might be guessed. It
would hardly be legitimate, for example, to give a return of
incomes relating to a limited district in such a form that the
income of the two or three wealthiest men in the district would
be clear to any intelligent reader with local knowledge. If the
intervals be made unequal, the application of many statistical
methods is rendered awkward, or even impossible, and the
relative values of the frequencies are at first sight misleading, so
that the table is not perspicuous. Thus, consider the first two
columns of Table IV., showing the numbers of dwelling-houses
of different annual values, assessed to inhabited house duty. On
running the eye down the column headed “number of houses” it
is at once caught by the two striking irregularities at the classes
“£60 and under £80,” and “£100 and under £150.” But these
have no real significance ; they are merely due to changes from
a £10 to a £20, and then to a £50 interval. Moreover, the
intervals after £150 go on continuously increasing, but attention
is not directed thereto by any marked changes in the frequencies.
To make the latter really comparable inter se, they must first be

82
        <pb n="103" />
        VIL.—THE FREQUENCY-DISTRIBUTION. :
TABLE IV. Showing the Annual Value and Number of Dwelling-houses in
Great Britain assessed to Inhabited House Duty in 1885-6, (Cited from
Jour, Roy. Stat. Soc., vol. 1., 1887, p. 610.)
Annual Value in #5, Number AY
nnua alue in S, of Houses. al
£20 and under £30 306,408 306,408
30 5s 40 182,972 | 182,972
40 5 50 105.407 | 105,407
gol 60 63,096 63,096
60 4 80 71,436 35,718
80 i 100 52,365 16.182
100 at 150 41,336 8 7a7
150 5 300 26,732 .
300 " 500 6,198
500 ” 1000 2,008
1000 and upwards Lh
Total number of houses 838.6: °
reduced to a common interval as basis, e.g. £10, by dividing the
fifth and sixth numbers by 2, the seventh by 5, the eighth by 15,
and so on. This gives the mean frequencies per £10 interval
tabulated in the third column of Table IV. The reduction is,
however, impossible in the case of the last class, for we are only
told the number of houses of £1000 annual value and upwards :
the magnitude of the class is indefinite. Such an indefinite class
is in many respects a great inconvenience, and should always be
avoided in work not subject to the necessary limitations of
official publications.

The general rule that intervals should be equal must not be
held to bar the analysis by smaller equal intervals of some
portion of the range over which the frequency varies very
rapidly. In Table XII, p. 98, for example, giving the numbers
of deaths from diphtheria at successive ages, a five-year interval
might be substituted with advantage for the irregular intervals
after the fifth year of age, but it would still be desirable to give
the numbers of deaths in each year for the first five years, so as
to bring out the rapid rise to the maximum in the fourth year
of life.

11. When the table has been completed, it is often convenient
to represent the frequency-distribution by means of a diagram
which conveys the general run of the observations to the eye
better than a column of figures. The following short table,

83
        <pb n="104" />
        THEORY OF STATISTICS.

giving the distribution of head-breadths for 1000 men, will serve

as an example.

TABLE Y.—Showing the Frequency-distribution of Head-breadths for Students
at Cambridge. Measurements taken to the mearest tenth of am inch.
(Cited from W. R. Macdonell, Biometrika, i., 1902, p. 220.)

Number of Number of
Ee Men with said Rng Men with said
: Head-breadth. E Head-breadth.
55 3 6°3 99
56 12 6-4 37
5-7 43 65 15
58 80 66 12
59 131 67 3
60 236 6°8 2
61 185 RoC
62 142 Total 1000
Taking a piece of squared paper ruled, say, in inches and tenths,
mark off along a horizontal base-line a scale representing class-
intervals ; a half-inch to the class-interval would be suitable.

Then choose a vertical scale for the class-frequencies, say 50

observations per interval to the inch, and mark off, on the

verticals or ordinates through the points marked 55, 56, 5-7

. . . . at the centres of the class-intervals on the base-line, heights

representing on this scale the class-frequencies 3, 12, 43. . . .

The diagram may then be completed in one of two ways: (1)

as a frequency-polygon, by joining up the marks on the ver-

ticals by straight lines, the last points at each end being joined

down to the base at the centre of the next class-interval (fig. 1);

or (2) as a column diagram or histogram (to use a term sug-

gested by Professor Pearson, ref. 1), short horizontals being drawn
through the marks on the verticals (fig. 2), which now form the
central axes of a series of rectangles representing the class-
frequencies. The student should note that in any such diagram,
of either form, a certain area represents a given number of
observations. On the scales suggested, 1 inch on the horizontal
represents 2 intervals, and 1 inch on the vertical represents 50
observations per interval: 1 square inch therefore represents
50x 2=100 observations. The diagrams are, however, con-
ventional : the whole area of the figure is correct in either case,
but the area over each interval is not correct in the case of the
frequency-polygon, and the frequency of each fraction of any

84
        <pb n="105" />
        VI.—THE FREQUENCY-DISTRIBUTION.
CHT pm
20
74
gf 9 &amp;u “1 on =
Head breadth in inches
Fic. 1.—Frequency-Polygon for Head-breadths of 1000 Cambridge
Students. (Table V.)
250 — _
200
iJ
3
a] ob
- i ‘8 BO sT ~ ‘vo # ‘5 6
Head breadth in inches.
Fig. 2. —Histogram for the same data as Fig. 1.

85
        <pb n="106" />
        THEORY OF STATISTICS.
interval is not the same, as suggested by the histogram. The
area shown by the frequency-polygon over any interval with an
ordinate y, (fig. 3) is only correct if the tops of the three
successive ordinates v;, 7, 7; lie on a line, s.e. if Yo=3(y, +95),
the areas of the two little triangles shaded in the figure being
cqual. If y, fall short of this value, the area shown by the
Fie. 4.
polygon is too great; if y, exceed it, the area shown by the
polygon is too small; and if, for this reason, the frequency-
polygon tends to become very misleading at any part of the
range, it is better to use the histogram. In the mortality dis-
tribution of Table IL, for instance, the frequency rises so sharply

86
        <pb n="107" />
        VL—THE FREQUENCY-DISTRIBUTION. 87
to the maximum that a histogram is, on the whole, the better re-
presentation of the distribution of frequency, and in such a
distribution as that of Table IV. the use of the histogram is
almost imperative.

12. If the class-interval be made smaller and smaller, and at
the same time the number of observations be proportionately in-
creased, so that the class-frequencies may remain finite, the
polygon and the histogram will approach more and more closely
to a smooth curve. Such an ideal limit to the frequency-polygon
or histogram is termed a frequency-curve. In this ideal frequency-
curve the area between any two ordinates whatever is strictly
proportional to the number of observations falling between the
corresponding values of the variable. Thus the number of
observations falling between the values z, and z, of the variable
in fig. 4 will be proportional to the area of the shaded strip in the
figure; the number of observed values greater than z, will
similarly be given by the area of the curve to the right of the
ordinate through z,, and so on. When, in any actual case, the
number of observations is considerable—say a thousand at least
—the run of the class-frequencies is generally sufficiently
smooth to give a good notion of the form of the ideal distri-
bution ; with small numbers the frequencies may present all
kinds of irregularities, which, most probably, have very little
significance (¢f. Chap. XV. § 15, and § 18, Ex. iv.). The forms
presented by smoothly running sets of numerous observations
present an almost endless variety, but amongst these we notice
a small number of comparatively simple types, from which many
at least of the more complex distributions may be conceived as
compounded. For elementary purposes it is sufficient to consider
these fundamental simple types as four in number, the symmetri-
cal distribution, the moderately asymmetrical distribution, the
extremely asymmetrical or J-shaped distribution, and the U-shaped
distribution.

13. The symmetrical distribution, the class-frequencies decreas-
ing to zero symmetrically on either side of a central maximum.
Fig. 5 illustrates the ideal form of the distribution.

Being a special case of the more general type described under
the second heading, this form of distribution is comparatively rare
under any circumstances, and very exceptional indeed in economic
statistics. It occurs more frequently in the case of biometric, more
especially anthropometric, measurements, from which the following
illustrations are drawn, and is important in much theoretical work.
Table VI. shows the frequency-distribution of statures for adult
males in the British Isles, from data published by a British
Association Committee in 1883, the figures being given separately
        <pb n="108" />
        THEORY OF STATISTICS.
TABLE VI.—Showing the Frequency-distributions of Statures Jor Adult
Males born in England, Ireland, Scotland, and Wales. Final Report of
the Anthropometric Committee to the British Association, (Report, 1883,
PD. 256.) As Measurements are stated to have been taken to the neares)
ith of an Inch, the Class-Intervals are here presumably 56313-573%,
6715-5818, and so on (¢f. § 9). See Fig. 6.
Number of Men within said Limits of Height.
1 f Birth—
Height without Blaceafifn
Total.
shoes, Inches.
England. Scotland. Wales. Ireland.
57— 1 — 1 — &gt;
58- 3 1 — 4
59 2 = ; 1 14
60— 39 ¥ 2 | == 41
61- 70 "2 9 ; 2 83
62 128 9 | 30 2 bo 169
{= 320 19 : 48 7 394
{= 524 47 83 15 | 669
= 740 109 I 108 53 990
{o— 881 139 LT 58 1223
/— 918 210 128 73 | 1329
— 886 210 72 2 1230
= 753 218 "2 ) 1063
a 473 115 P ‘5 646
om 254 | 102 SU 3 392
yo 117 69 | ) 202
73- 48 : 26 3 79
74- 16 15 » 32
75- 1 6 - 16
76- : J
77- 2
Total ro 4 we 2 8585
for persons born in England, Scotland, Wales, and Ireland, and
totalled in the last column. These frequency-distributions are
approximately of the symmetrical type. The frequency-polygon
for the totals given by the last column of the table is shown
in fig. 6. The student will notice that an error of 1% inch,
scarcely appreciable in the diagram on its reduced scale, is neglected
in the scale shown on the base-line, the intervals being treated
as if they were 57-58, 58-59, etc. Diagrams should be drawn for
comparison showing, to a good open scale, the separate distributions
for England, Scotland, Wales, and Ireland.

88
5194 13u : 346 »
        <pb n="109" />
        VI.—THE FREQUENCY-DISTRIBUTION,
Fie. 5.—An ideal symmetrical Frequency-distribution,
Nn
1200
6
C90
H Bi
= 30
~ Ginn Se ripen | neat .
58 60 62 64 66 68 Ww R74 76" 783 80
Stature in inches.
Fic. 6.—Frequency-distribution of Stature for 8585 Adult Males born in
the British Isles, (Table VI.)

RG
        <pb n="110" />
        THEORY OF STATISTICS.

Table VII. gives two similar distributions from more recent
investigations, relating respectively to sons over 18 years of
age, with parents living, in Great Britain, and to students at
Cambridge. The polygons are shown in figs. 7 and 8. Both these
distributions are more irregular than that of fig. 6, but, roughly
speaking, they may all be held to be approximately symmetrical.

14. The moderately asymmetrical distribution, the class-fre-
quencies decreasing with markedly greater rapidity on one side of
the maximum than on the other, as in fig. 9 (a) or (6). This is
the most common of all smooth forms of frequency-distribution,
illustrations occurring in statistics from almost every source. The
distribution of death-rates in the registration districts of England
TABLE VIL. —Showing the Frequency-distribuiion of Statures for (1) 1078

English Sons (Karl Pearson, Biometrika, ii., 1903, p. 415); (2) for 1000
Male Students at Cambridge (W. R. Macdonell, Biometrika, i., 1902,
p- 220). See Figs. 7 and 8.
Number of Men within said
Limits of Stature.
Stature in ~~ -
Inches. O ” @
; ambridge
English Sons. Students,
595-605 2:0 S
605-615 15 —_
615-625 3:5 4-0
625-635 20°5 190
635-645 385 24°5
645-655 615 40°5
655-665 895 845
665-675 148°0 1235
675-685 1735 139°0
685-695 149°5 179°0
695-705 1280 1385
70°5-715 1080 1080
715-725 630 535
725-735 42°0 47°5
73:5-74' 29:0 210
746-755 85 12:0
755-765 4-0 50
765-775 4:0 05
77:5-785 3:0 =
78:5-79°5 0°5 =
Total 9

90
1078 100s
        <pb n="111" />
        VI.—THE FREQUENCY-DISTRIBUTION.
209M |
£786
S160;
“
Sas
x
J
J
53 &lt; 4 ov =~ J &lt; 4 6 v.80
Stature in inches
F16. 7.—Frequency-distribution of Stature for 1078 *‘ English Sons,”
(Table VII.)
250
178.
R766
70
3 .
4
ob
62. 2 ¢ 6 og. Sv % 4 6 4 80
Stature in inches.
Fie. 8, —Frequency-distribution of Stature for 1000 Cambridge
Students. (Table VIL.)

01
        <pb n="112" />
        THEORY OF STATISTICS.
and Wales, given in Table I., p. 77, is a somewhat rough example
of the type. The distribution of rates of pauperism in the same
(8) (ex)
Ee
Fria. 9.—Ideal distributions of the moderately asymmetrical form.
districts (Table VIII. and fig. 10) is smoother and more like the
type (a) of fig 9. The frequency attains a maximum for
700.
uy
3 9%
5 0}
[70}
L061
5 Ii
2 q0}
£30
j 2
§ 10
K
O—- 4
0 ; = ov 4 ov o / _r 0 11
Percentage of the population in receipt of relief .
Fie. 10.—Frequency-distribution of Pauperism (Percentage of the Population
in Receipt of Poor-law Relief) on 1st January 1891 in the Registration
Districts of England and Wales: 632 Districts. (Table VIII.)

92
(2)
        <pb n="113" />
        VI.—THE FREQUENCY-DISTRIBUTION.

districts with 23 to 3} per cent. of the population in receipt of

relief, and then tails off slowly to unions with 6, 7, and 8 per

cent. of pauperism.

TABLE VIII. —Showing the Number of Registration Districts in England and
Wales with Different Percentages of the Population in receipt of Poor-law
Relief on the 1st January 1891. (Yule, Jour. Roy. Stat. Soc., vol. lix.,
1896, p. 347, g.v. for distributions for earlier years.) See Fig. 10.

Percentage of 4 ober of
the Popalation given Percent-
Wn Relict ° age in receipt
SHEE of Relief,
075-125 18
125-175 43
175-225 7.
2:25-2-75 &amp;
275-325 100
8:25-3'75 ¢
375-425 :
425-475 \
475-525
525-575 :
575-625 :
625-675
675-725
725-775
775-825
825-875
Total
While the distribution of stature is in general symmetrical, that
of weight is asymmetrical or skew, the greater frequencies lying
towards the lower end of the range. This is shown very well by
the data (Table IX. and fig. 11) collected by the same British
Association Committee, from the Report of which the data as to
stature were cited in the last section. As in the case of the stature
diagram (fig. 6), the small error of 1 lb. has been neglected, for
the sake of brevity, in lettering the base-line of fig. 11, the classes
being treated as if they were 90 1b.-100 Ilb., 100 1b.-110 Ib,
and so on.
Table X. and fig. 12 give a biological illustration, viz. the
distribution of fecundity (ratio of yearling foals produced to
coverings) in mares. The student should notice the difficulty

93
        <pb n="114" />
        04

THEORY OF STATISTICS.
1,600
3
§t200
3
S
X 80
Ma
DV
R
3 200,
2
0 oe =
85 Wo 125 145 165 185 Rus R225 k+3 265 285
Weight in ibs
Fic. 11.—Frequency-distribution of Weight for 7749 Adult Males in
the British Isles. (Table IX.)
350 ~
30
od
&gt; 250,
* 200,
3
S 750,
§ 700,
{4 50
A
0 1/15 215 3/15 4/15 5[15 6/15 15 815 315 1015 1115 13/15 i315 afi
Ratio of Yearling foals produced to coverings.
Fic. 12, —Frequency-distribution of Fecundity for Brood-mares ;
2000 observations. (Table X.)
        <pb n="115" />
        VI.—THE FREQUENCY-DISTRIBUTION.

TABLE IX. —Showing the Frequency-distribution of Weights for Adult Males
born in England, Ireland, Scotland, and Wales. (Loc. cit., Table V1.)
Weights were taken to the nearest pound, consequently the true Class-
Intervals are 89°5-99°5, 99:5-109°5, etc. (§ 9).

Number of Men within given Limits of

: Weight. Place of Birth—

Weight Total,
in lbs.
England. Scotland. Wales. Ireland.

90- 2 -- rn — 2
100- 26 2 gH 34
110-  ! 133 ) 10 1 152
120- 338 | 22 23 i 390
130- 694 63 68 42 867
140- 1240 173 153 {7 1623
150- 1075 255 178 Li 1559
160- 881 275 134 HN 1326
170- 492 168 102 £3 787
180- 304 125 oF I 476
190- 174 F J 263
200- i ? 107
210- { : 85
220- ‘ 41
230- 3 135
240- 11
250-

260-

270-

280- -

Total ~~ 5..2 1.12 ; 7:49

of classification in this case: the class-interval chosen throughout

the middle of the range is 1/15th, but the last interval is

“29/30-1.” This is not a whole interval, but it is more than a

half, for all the cases of complete fecundity are reckoned into the

class. In the diagram (fig. 12) it has been reckoned as a whole
class, and this gives a smooth distribution.

To take an illustration from meteorology, the distribution of
barometer heights at any one station over a period of time is, in
general, asymmetrical, the most frequent heights lying towards the
upper end of the range for stations in England and Wales.
Table XI. and fig. 13 show the distribution for daily observations
at Southampton during the years 1878-90 inclusive.

The distributions of Tables VIIL.-XI. all follow more or less the
type of fig. 9 (a), the frequency tailing off, at the steeper end of

95
vy 136
        <pb n="116" />
        THEORY OF STATISTICS.

TABLE X.—Showing the Frequency-distribution of Fecundity, i.e. the Ratic
of the Number of Yearling Foals produced to the Number of Coverings,
for Brood-mares (Race-horses) Covered Eight Times ai Least. (Pearson,
Lee, and Moore, Phil. Trans., A, vol. cxcii. (1899), p. 303.) See Fig. 12.

Number of Number of
Mares with Mares with
Fecundity. Fecundity Fecundity. Fecundity
between the between the
Given Limits, Given Limits.
1/30- 3/30 2 17/30-19/30 315
3/30- 5/30 75 19/30-21/30 337
5/30- 7/30 11-5 21/30-23/30 2935
7/30- 9/30 215 23/30-25/30 204
9/30-11/30 55 25/30-27/30 127
11/30-13/30 104°5 27/30-29/30 49
18/30-15/30 182 29/30-1 19
15/30-17/30 2715 -
Total 20000

TABLE XI. — Showing the Frequency-distribution of Barometer Heights for
Daily Observations during the Thirteen Years 1878-1890 at Southampton.
(Karl Pearson and A. Lee, Phil. Trans., A, vol. cxe. (1897), p. 428, g.v.
for numerous other distributions.) See Fig. 13.

Number of Days Number of Days
Height of on which Height Height of on which Height
Barometer was observed Barometer was observed
in Inches. between the in Inches, between the
Given Limits. Given Limits.
28:45-28°55 1 29-85- ‘95 5485
55-65 i ‘95-3005 6025
‘65— “75 2 ¢ 30°05- “15 6195
‘75- 85 s 2] 58-25 500
85— 95 8°5 | 25—- 35 382
*95-29°05 13:5 35— 4b 2375
29°05- °15 21°5 *45—- 5b 1895
‘15- 25 37 55-65 88:5
25- +35 79 ‘65-75 435
*35— 45 108 75-85 i
45- 55 181°5 ‘85—- ‘95 !
*55—- 65 2545 3095-3105 :
‘65— 75 3485 =
s7 5-085 463°5 Total ‘3

96
474;
        <pb n="117" />
        VI.—THE FREQUENCY-DISTRIBUTION.
Ci
79
6
2 Zo 2D wv oJ
Height in inches
F1@. 13. —Frequency-distribution of Barometer Heights at
Southampton: 4748 observations. (Table XI.)
3
3
eg
” wr 20 30 0 oJ i
Years of age
Fic. 14.—Frequency-distribution of Deaths from Diphtheria at different Ages
in England and Wales, 1891-1900. (Table XII.)

97
El . 5 7 JC
        <pb n="118" />
        THEORY OF STATISTICS.

the distribution, in such a way as to suggest that the ideal

curve is tangential to the base. Cases of greater asymmetry,

suggesting an ideal curve that meets the base (at one end) at a

finite angle, even a right angle, asin fig. 9 (8), are less frequent,

but occur occasionally. The distribution of deaths from diphtheria,
according to age, affords one such example of a more asymmetrical
kind. The actual figures for this case are given in Table XIL., and
illustrated by fig. 14 ; and it will be seen that the frequency of

deaths reaches a maximum for children aged “3 and under 4,”

the number rising very rapidly to the maximum, and thence

falling so slowly that there is still an appreciable frequency for
persons over 60 or 70 years of age.

TABLE XII. —Showing the Numbers of Deaths from Diphtheria at Different
Ages in England and Wales during the Ten Years 1891-1900. (Supple-
ment to 65th Annual Report of the Registrar-General, 1891-1900, p. 3.)
See Fig. 14.

Number of
. Deaths between Number
Attelin Wears, Given Limits per Annum.

of Age.

Under 1 year 4,186 4,186

- 10,491 10,491

2- 11,218 11,218

3- 12,390 12,390

1- 11,194 11,194

i 23,348 4,670

10- 4,092 818

15— 1,123 295
20~ 585
25- 786
35- 512
45- 324
55— 260
65— 127
75 and upwards 35
Total 80,671

15. The extremely asymmetrical, or J-shaped,” distribution, the
class-frequencies running up to a maximum at one end of the

range, as in fig. 15.

This may be regarded as the extreme form of the last distribution,
from which it cannot always be distinguished by elementary
methods if the original data are not available. If, for instance,
the frequencies of Table XII. had been given by five-year intervals

08
        <pb n="119" />
        VI.—THE FREQUENCY-DISTRIBUTION.
only, they would have run 49,479, 23,348, 4,092, and so on,
thus suggesting a maximum number of deaths at the beginning
of life, ve. a distribution of the present type. It is only the
analysis of the deaths in the earlier years of life by one-year
intervals which shows that the frequency reaches a true maximum
in the fourth year, and therefore the distribution is of the
moderately asymmetrical type. In practical cases no hard and

Fic. 15.—An ideal Distribution of the extreme Asymmetrical Form.

fast line can always be drawn between the moderately and
extremely asymmetrical types, any more than between the
moderately asymmetrical and the symmetrical type.

In economic statistics this form of distribution is particularly
characteristic of the distribution of wealth in the population at
large, as illustrated, e.g., by income tax and house valuation returns,
by returns of the size of agricultural holdings, and so on (cf. ref. 4).
The distributions may possibly be a very extreme case of the last
type ; but if the maximum is not absolutely at the lower end of the

99
        <pb n="120" />
        10n THEORY OF STATISTICS.
range, it is very close indeed thereto. Official returns do not
usually give the necessary analysis of the frequencies at the
lower end of the range to enable the exact position of the maximum
to be determined ; and for this reason the data on which Table
XIII. is founded, though of course very unreliable, are of some
interest. It will be seen from the table and fig. 16 that with the
given classification the distribution appears clearly assignable to
the present type, the number of estates between zero and £100
in annual value being more than six times as great as the number
between £100 and £200 in annual value, and the frequency
continuously falling as the value increases. A close analysis of
the first class suggests, however, that the greatest frequency does
not occur actually at zero, but that there is a true maximum
frequency for estates of about £1 15 0 in annual value. The
distribution might therefore be more correctly assigned to the
second type, but the position of the greatest frequency indicates a
TABLE XIII.—Showing the Numbers and Annual Values of the Estates of
those who had taken part in the Jacobite Rising of 1715. (Compiled from
Cosin’s Names of the Roman Catholics, Nonjurors, and others who refused
to take the Oaths to his late Majesty King George, etc. ; London, 1745.
Figures of very doubtful absolute value. See a note in Southey’s
Commonplace Book, vol. i, p. 578, quoted from the Memoirs of T. Hollis.)
See Fig. 16.
Annual Annual
Value in ye of Value in Nb y of
£100. SE. £100. RE
0-1 17265 17-18
1- 2 280
2- 3 1405 20-21
3-4 87 21-22
4-- 465 22-23
Jo— U 42°5 23-24
3-7 29:5
7- 8 25:50 27-28
8-9 18:5
9-10 21 31-32
10-11 | 11'5
11-12 9:5 39-40
12-13 es .
13-14 35k 45-46 !
14-15 ! =
15-16 Fo 48-49 !
16-17 v
Total 176

vl
du
        <pb n="121" />
        VI.—THE FREQUENCY-DISTRIBUTION. 101
degree of asymmetry that is high even compared with the
asymmetry of fig. 14: the distribution of numbers of deaths from

~ 70]
g 13 Tre
0 W.-W tg sale 7.8 90 HR
Annual value in £100
F16. 16.—Frequency-distribution of the Annual Values of certain Estates
in England in 1715: 2476 Estates, (Table XIIL)
diphtheria would more closely resemble the distribution of estate-
values if the maximum occurred in the fourth and fifth weeks
of life instead of in the fourth year. The figures of Table IV.,
p. 83, showing the annual value and number of dwelling-houses,
        <pb n="122" />
        r THEORY OF STATISTICS.
afford a good illustration of this form of distribution, but marred
by the unequal intervals so common in official returns.
TABLE XIV. —Showing the Frequencies of Different Numbers of Petals Sor
Three Series of Ranunculus bulbosus. (H. de Vries, Ber. disch. bot. Ges.,
Bd. xii., 1894, g.v. for details.) See Fig. 17.
Frequency.
Number _ _
of Petals.
Series A. Series B. Series C.
5 312 345 133
# 17 24 55
4 7 | 23
2 — 7
1 2 2 :
10 —~ - }
Li _
Total oor ong eng
The type is not very frequent in other classes of material, but
instances occur here and there. Table XIV, and fig. 17 show
Sa
3
io.
5 | | | Ho
S 0 Ey... of E, ] Fl  — . l -
SEGINT NSO SHGCRT 8 FiO SRC SNINID
F16. 17.—Frequency-distributions of Numbers of Petals for Three Series of
Ranunculus bulbosus: A 337, B 380, C 222 observations, (Table XIV.)
distributions of this form for the petals of the buttercup, Ranun-
culus bulbosus.
16. The U-shaped distribution, exhibiting a maximum frequency

202
an 2a
        <pb n="123" />
        VL —THE FREQUENCY-DISTRIBUTION. 193
at the ends of the range and a minimum towards the centre.
The ideal form of the distribution is illustrated by fig. 18.

F16. 18. —An ideal Distribution of the U-shaped Form,

This is a rare but interesting form of distribution, as it stands
in somewhat marked contrast to the preceding forms. Table XV.
and fig. 19 illustrate an example based on a considerable number
of observations, viz. the distribution of degrees of cloudiness, or
estimated percentage of the sky covered by cloud, at Breslau
TABLE XV. —Showing the Frequencies of Estimated Intensities of Cloudiness

at Breslau during the Ten Years 1876-85. (See ref. 2.) See Fig. 19.
Cloudiness. Frequency. Cloudiness. | Frequency.
71 21
b 71
L7 194
: 117
2089
"tal -9

lis
265.
        <pb n="124" />
        104 THEORY OF STATISTICS.

during the years 1876-85. A sky completely, or almost com-
pletely, overcast at the time of observation is the most common,
a practically clear sky comes next, and intermediates are more
rare.

This form of distribution appears to be sometimes exhibited by
the percentages of offspring possessing a certain attribute when one
at least of the parents also possesses the attribute. The remarks

2000 |
! 1500
14000]
~
S 500
; Te
Dlr. a te A Ee. wr
Cloudiness
Fig. 19.—Frequency-distribution of Degrees of Cloudiness at Breslau
1876-85: 8653 observations. (Table XV.)
of Sir Francis Galton in Natural [nheritance suggest such a
form for the distribution of ¢consumptivity” amongst the off-
spring of consumptives, but the figures are not in a decisive shape.
Table XVI. gives the distribution for an analogous case, viz. the
TABLE XVI.—Showing the Percentages of Deaf-mutes among Children of
Parents one of whom at least was a Deaf-mute, for Marriages producing
Five Children or more. (Compiled from material in Marriages of the Deaf
tn America, ed. BE. A, Fay, Volta Bureau, Washington, 1898.)
Rl Nuiiber of 1! Peni Number of
Deaf-mutes. Families. 1.0 mutes. Fezailies.
0-20 220 60-80 55
20-40 20°5 80-100 15
40-60 12 —
Total "3

92%
        <pb n="125" />
        VI.—THE FREQUENCY-DISTRIBUTION, Fas
distribution of deaf-mutism amongst the offspring of parents one
of whom at least was a deaf mute. In general less than one-fifth
of the children are deaf-mutes : at the other end of the range the
cases in which over 80 per cent. of the children are deaf-mutes are
nearly three times as many as those in which the percentage lies
between 60 and 80. The numbers are, however, too small to form
a very satisfactory illustration.

REFERENCES.
(1) Pearson, KARL, “Skew Variation in Homogeneous Material,” Phil.
Trans. Roy. Soc., Series A, vol. clxxxvi. (1895), pp. 343-414.
(2) PEARsoN, KARL, ‘“Cloudiness: Note on a Novel Case of Frequency,”
Proc. Roy. Soe., vol. Ixii, (1897), p. 287.
(3) PEARSON, KARL, ‘Supplement to a Memoir on Skew Variation,” Phil.
Trans. Roy. Soc., Series A, vol. exevii. (1901), pp. 443-459.
(4) PArETO, VILFREDO, Cours d’économie politique ; 2 vols., Lausanne, 1896-7.
See especially tome ii., livre iii., chap. i., ‘La courbe des revenus.”
The first three memoirs above are mathematical memoirs on the theory
of ideal frequency-curves, the first being the fundamental memoir, and
the second and third supplementary. The elementary student may,
however, refer tc them with advantage, on account of the large collection
of frequency-distributions which is given, and from which some of the
illustrations in the preceding chapter have been cited. Without
attempting to follow the mathematics, he may also note that each of
our rough empirical types may be divided info several sub-types, the
theoretical division into types being made on different grounds.
The fourth work is cited on account of the author’s discussion of the dis-
tribution of wealth in a community, to which reference was made in § 15.
In connection with the remarks in § 6, on the grouping of ages,
reference may be made to the following in which a different conclusion
is drawn as to the best grouping : —
(5) Youne, ALLYN A., “‘ A Discussion of Age Statistics,” Census Bulletin 18,
Bureau of the Census, Washington, U.S.A., 1904.
Reference should also be made to the Census of England and Wales,
1911, vol. vii., *‘ Ages and Condition as to Marriage,” especially the
Report by Mr George King on the graduation of ages,
EXERCISES.

1. If the diagram fig. 6 is redrawn to scales of 300 observations per interval
to the inch and 4 inches of stature to the inch, what is the scale of observa-
tions to the square inch ?

If the scales are 100 observations per interval to the centimetre and 2 inches
of stature to the centimetre, what is the scale of observations to the
square centimetre ?

2. If fig. 10 isredrawn to scales of 25 observations per interval to theinch and
2 per cent. to the inch, what is the scale of observations to the square inch ?

If the scales are ten observations per interval to the centimetre and 1 per cent.
to the centimetre, what is the scale of observations to the square centimetre ?

3. If a frequency-polygon be drawn to represent the data of Table I., what
number of observations will the polygon show between death-rates of 165
and 17'5 per thousand, instead of the true number 159 ?

4. If a frequency-polygon be drawn to represent the data of Table V.,
what number of observations will the polygon show between head-breadths
5'95 and 6°05, instead of the true number 236 ?

1UF
        <pb n="126" />
        CHAPTER VIL
AVERAGES.
1. Necessity for quantitative definition of the characters of a frequency-
distribution—2. Measures of position (averages) and of dispersion—s3.
The dimensions of an average the same as those of the variable—4.
Desirable properties for an average to possess—5. The commoner forms
of average—6-13. Thearithmetic mean : its definition, calculation, and
simpler properties—14-18. The median : its definition, calculation, and
simpler properties—19-20. The mode: its definition and relation to
mean and median—21. Summary comparison of the preceding forms
of average—22-26. The geometric mean: its definition, simpler pro-
perties, and the cases in which it is specially applicable—27. The
harmonic mean : its definition and calculation,
1. IN § 2 of the last chapter it was pointed out that a classification
of the observations in any long series is the first step necessary
to make the observations comprehensible, and to render possible
those comparisons with other series which are essential for any
discussion of causation. Very little experience, however, would
show that classification alone is not an adequate method, seeing
that it only enables qualitative or verbal comparisons to be made.
The next step that it is desirable to take is the quantitative
definition of the characters of the frequency-distribution, so that
quantitative comparisons may be made between the corresponding
characters of two or more series. It might seem at first sight
that very difficult cases of comparison could arise in which, for
example, we had to contrast a symmetrical distribution with a “J-
shaped ” distribution. As a matter of practice, however, we seldom
have to deal with such a case; distributions drawn from similar
material are, in general, of similar form. When we have to
compare the frequency-distributions of stature in two races of
man, of the death-rates in English registration districts in two
successive decades, of the numbers of petals in two races of the
same species of Ranunculus, we have only to compare with each
other two distributions of the same or nearly the same type.
9. Confining our attention, then, to this simple case, there are
two fundamental characteristics in which such distributions may
106
        <pb n="127" />
        RA CITEPS
VII. —AVERAGES. S147 *
oo &gt;

differ : (1) they may differ markedly in position, z.e. ir the Palubso th ek
of the variable round which they centre, as in fig. 2. 4, or (2)
they may centre round the same value, but differ in t Hinge of
variation or dispersion, as it is termed, as in fig. 20, B. ours io) #
the distributions may, differ in both characters at once, as in ¥&amp;20,'"'"

C, but the two properties may be considered independently.
Measures of the first character, position, are generally known as
averages ; measures of the second are termed measures of disper-

sion. In addition to these two principal and fundamental
characters, we may also take a third of some interest but of much

less importance, viz. the degree of asymmetry of the distribution.

ee ee

2

ee

L.. =
12

Fie. 20.
The present chapter deals only with averages; measures of
dispersion are considered in Chapter VIII. and measures of
asymmetry are also briefly discussed at the end of that chapter.
3. In whatever way an average is defined, it may be as well to

note, it is merely a certain value of the variable, and is therefore
necessarily of the same dimensions as the variable: z.e. if the
variable be a length, its average is a length ; if the variable be a
percentage, its average is a percentage, and so on. But there are

several different ways of approximately defining the position of a
frequency-distribution, that is, there are several different forms of
average, and the question therefore arises, By what criteria are we

to judge the relative merits of different forms? What are, in fact.
the desirable properties for an average to possess?

a
B
Cc
        <pb n="128" />
        THEORY OF STATISTICS.

4. (a) In the first place, it almost goes without saying that an
average should be rigidly defined, and not left to the mere estimation
of the observer. An average that was merely estimated would
depend too largely on the observer as well as the data. (b) An
average should be based on all the observations made. If not,
it is not really a characteristic of the whole distribution. (c) It
is desirable that the average should possess some simple and
obvious properties to render its general nature readily compre-
hensible : an average should not be of too abstract a mathematical
character. (d) It is, of course, desirable that an average should
be calculated with reasonable ease and rapidity. Other things
being equal, the easier calculated is the better of two forms of
average. At the same time too great weight must not be attached
to mere ease of calculation, to the neglect of other factors. (e)
It is desirable that the average should be as little affected as
may be possible by what we have termed fluctuations of sampling.
If different samples be drawn from the same material, however
carefully they may be taken, the averages of the different samples
will rarely be quite the same, but one form of average may show
much greater differences than another. Of the two forms, the
more stable is the better. The full discussion of this condition
must, however, be postponed ‘to a later section of this work
(Chap. XVIL). (f) Finally, by far the most important desideratum
is this, that the measure chosen shall lend itself readily to
algebraical treatment. If, e.g., two or more series of observations
on similar material are given, the average of the combined series
should be readily expressed in terms of the averages of the
component series : if a variable may be ‘expressed as the sum of
two or more others, the average of the whole should be readily
expressed in terms of the averages of its parts. A measure for
which simple relations of this kind cannot be readily determined
is likely to prove of somewhat limited application.

5. There are three forms of average in common use, the
arithmetic mean, the median, and the mode, the first named being
by far the most widely used in general statistical work. To
these may be added the geometric mean and the harmonic mean,
more rarely used, but of service in special cases. We will con-
sider these in the order named.

6. The arithmetic mean.—The arithmetic mean of a series of
values of a variable X;, X,, X;, . .. X,, &amp; in number, is the
quotient of the sum of the values by their number. That is to
say, if M be the arithmetic mean,

M= 3 (X; + Xt Xt itis: Fn)

108
        <pb n="129" />
        VIL. —AVERAGES. 109
or, to express it more briefly by using the symbol 3 to denote
“the sum of all quantities like,”

1

M= 7 2(&amp;) 2 (1)
The word mean or average alone, without qualification, is very
generally used to denote this particular form of average: that
is to say, when anyone speaks of ‘the mean” or “the average”
of a series of observations, it may, as a rule, be assumed that the
arithmetic mean is meant. It is evident that the arithmetic
mean fulfils the conditions laid down in (a) and (8) of § 4, for it
is rigidly defined and based on all the observations made.
Further, it fulfils condition (c), for its general nature is readily
comprehensible. If the wages-bill for &amp; workmen is £P, the
arithmetic mean wage, P/N pounds, is the amount that each
would receive if the whole sum available were divided equally
between them : conversely, if we are told that the mean wage
is £M, we know this means that the wages-bill is VV. pounds.
Similarly, if &amp; families possess a total of C children, the mean
number of children per family is C'/N—the number that each
family would possess if the children were shared uniformly.
Conversely, if the mean number of children per family is 27, the
total number of children in A families is #.4#. The arithmetic
mean expresses, in fact, a simple relation between the whole
and its parts.

7. As regards simplicity of calculation, the mean takes a high
position. In the cases just cited, it will be noted that the mean
is actually determined without even the necessity of determining
or noting all the individual values of the variable: to get the
mean wage we need not know the wages of every hand, but only
the wages-bill ; to get the mean number of children per family
we need not know the number in each family, but only the total.
If this total is not given, but we have to deal with a moderate
number of observations—so few (say 30 or 40) that it is hardly
worth while compiling the frequency-distribution—the arithmetic
mean is calculated directly as suggested by the definition, z.e.
all the values observed are added together and the total divided
by the number of observations. But if the number of observations
be large, this direct process becomes a little lengthy. It may
be shortened considerably by forming the frequency-table and
treating all the values in each class as if they were identical with
the mid-value of the class-interval, a process which in general
gives an approximation that is quite sufficiently exact for prac-
tical purposes if the class-interval has been taken moderately

vl
        <pb n="130" />
        110 THEORY OF STATISTICS.
small (¢f. Chap. VL. § 5). In this process each class-frequency
is multiplied by the mid-value of the interval, the products added
together, and the total divided by the number of observations.
If f denote the frequency of any class, X the mid-value of the
corresponding class-interval, the value of the mean so obtained
may be written—
1

M= = yates : . ot 140)

8. But this procedure is still further abbreviated in practice
by the following artifices:—(1) The class-interval is treated
as the unit of measurement throughout the arithmetic; (2) the
difference between the mean and the mid-value of some arbi-
trarily chosen class-interval is computed instead of the absolute
value of the mean.

If 4 be the arbitrarily chosen value and

X=41¢&amp;. . . (3)
ther
3(fX) =3(f.4) +3(f.9),
or, since 4 is a constant,
U=d+33(78) .®

The calculation of 3(f.X) is therefore replaced by the calcula-
tion of 3(f.£). The advantage of this is that the class-frequencies
need only be multiplied by small integral numbers; for 4
being the mid-value of a class-interval, and X the mid-value of
another, and the class-interval being treated as a unit, the E's
must be a series of integers proceeding from zero at the arbitrary
origin 4. To keep the values of £ as small as possible, 4 should
be chosen near the middle of the range.

It may be mentioned here that 3(£), or 3(f.¢) for the grouped
distribution, is sometimes termed the first moment of the distribu-
tion about the arbitrary origin 4: we shall not, however, make
use of this term.

9. The process is illustrated by the following example, using
the frequency-distribution of Table VIIL, Chap. VI. The
arbitrary origin 4 is taken at 3-5 per cent., the middle of the
sixth class-interval from the top of the table, and a little nearer
than the middle of the range to the estimated position of the
mean. The consequent values of ¢ are then written down as in
column (3) of the table, against the corresponding frequencies, the
values starting, of course, from zero opposite 3:5 per cent. Hach
frequency f is then multiplied by its £ and the products entered

Nr
CC aoil
        <pb n="131" />
        VIL—AVERAGES. 111
in another column (4). The positive and negative products are
totalled separately, giving totals — 776 and +509 respectively,
whence 3(f.£) = —267. Dividing this by #, viz. 632, we have
the difference of Jf from 4 in class-intervals, viz. 0-42 intervals,
that is 0-21 per cent. Hence the mean is 35-021 =3-29
per cent.

CALCULATION OF THE MEAN: Example i.—Calculation of the Arithmetic
Mean of the Percentages of the Population in receipt of Relief, from the
Figures of Table VIII, Chap. VI., p. 93.

(1) - (3) (4)
Mid-values
of the Deviation
Class-intervals ~~ Frequency from Arbitrary Product
{Percentage in Value 4 JE.
receipt of
Relief).
1 18 - 9d 90
5 48 - 4 192
2 J 72 - &amp; 216
25 89 - 2 178
3 100 - 1 100
35 an =77sD
1 75
45 120
5 120
55 84
Zz 5 '
65 :
/
7°5
85
Total g = + 909 :
=(f2)= +509 - 776 = — 267
M-4=- - class-intervals= — 0°42 class-intervals
= — 0°21 units
-*. mean M=3'5-0'21= 3°29 per cent.
It must always be remembered that 3(f.£)/N gives the value of
M— 4 in class-intervals, and must not be added directly to 4
unless the interval is also a unit. In the present illustration the

(2)
-
43% .
        <pb n="132" />
        112 THEORY OF STATISTICS.

interval is half a unit, and accordingly the quotient 267/632 is
halved in order to obtain an answer in units. Care must also be
taken to give the right sign to the quotient.

10. As the process is an important one we give a second illustra-
tion from the figures of Table VI., Chap. VI. In this case the class-
interval is a unit (1 inch), so the value of M — A is given directly
by dividing 3(f.£) by &amp;V. The student must notice that, measures
having been made to the nearest eighth of an inch, the mid-values
of the intervals are 577, 587, ete., and not 57-5, 58-5, ete.
CALCULATION OF THE MEAN: Example il.— Calculation of the Arithmetic

Mean Stature of Male Adults in thé British Isles from the Figures of
Chap. VI., Table VI, p. 88.
(1) £2) (3) (4)
Deviation
Height, Frequency from Arbitrary Product
Inches. ie Value 4 JE.
:

57- 2 -10 20
58 4 - 9 36
59— 14 ~ 8 | 112
60—- 41 - 17 : 287
61- 83 = 498
62— 169 - 845
63- 394 —- 1576
64— 669 - 2007
65— 990 —- 2 1980
66— 1223 - 1 1223
67—- 1329 0 — 8584
68— 1230 +1 1230
69— 1063 + 2 2126
70- 646 + 3 1938
71- 392 + 4 1568
72—- 202 + 5 1010
75%— 79 + u 474
74— ad + 7 224
75- 15 + °C 128
76— Fo 45
77- 10 20

Total &amp; n + 8763

2(ft)= +8763 —8584= + 179
wer :
M—-A4= + orgs = + 02 class-intervals or inches.
.% M=67{+'02=6746 inches.

(a
B&amp;F
        <pb n="133" />
        VIL.—AVERAGES. 113
It is evident that an absolute check on the arithmetic of any
such calculation may be effected by taking a different arbitrary
origin for the deviations: all the figures of col. (4) will be changed,
but the value ultimately obtained for the mean must be the
same. The student should note that a classification by unequal
intervals is, at best, a hindrance to this simple form of calculation,
and the use of an indefinite interval for the extremity of the
distribution renders the exact calculation of the mean impossible
(¢f. Chap. VI. § 10).
11. We return again below (§ 13) to the question of the
3
5
4.
“30
{ 20
rE
0
0 1 WI 8 6,9... "87.8 Tio
Percentage of the population in Ireceipt of relief.

Fie. 21. —Showing the Arithmetic Mean JZ, the Median Mi, and the Mode Mo,
by verticals drawn through the corresponding points on the base, for the
distribution of pauperism of fig. 10, p. 92.

errors caused by the assumption that all values within the same

interval may be treated as approximately the mid-value of the

interval. It is sufficient to say here that the error is in general
very small and of uncertain sign for a distribution of the
symmetrical or only moderately asymmetrical type, provided of
course the class-interval is not large (Chap. VI. § 5). In the case
of the “J-shaped” or extremely asymmetrical distribution, how-
ever, the error is evidently of definite sign, for in all the intervals
the frequency is piled up at the limit lying towards the greatest
frequency, .e. the lower end of the range in the case of the illustra.
tions given in Chap. VI,, and is not evenly distributed over the

A
        <pb n="134" />
        114 THEORY OF STATISTICS.

interval. In distributions of such a type the intervals must be
made very small indeed to secure an approximately accurate value
for the mean. The student should test for himself the effect of
different groupings in two or three different cases, so as to get
some idea of the degree of inaccuracy to be expected.

12. If a diagram has been drawn representing the frequency-
distribution, the position of the mean may conveniently be
indicated by a vertical through the corresponding point on the
base. Thus fig. 21 (a reproduction of fig. 10) shows the frequency-
polygon for our first illustration, and the vertical MJ indicates
the mean. In a moderately asymmetrical distribution at all of
this form the mean lies, as in the present example, on the side of
the greatest frequency towards the longer tail” of the distribu-

Mo Mid
Fie. 22.—Mean JM, Median M7, and Mode Mo, of the ideal moderately
asymmetrical distribution.

tion: Min fig. 22 shows similarly the position of the mean in
an ideal distribution. In a symmetrical distribution the mean
coincides with the centre of symmetry. The student should mark
the position of the mean in the diagram of every frequency dis-
tribution that he draws, and so accustom himself to thinking of
the mean, not as an abstraction, but always in relation to the
frequency-distribution of the variable concerned.

13. The following examples give important properties of the
arithmetic mean, and at the same time illustrate the facility of its
algebraic treatment :—

(a) The sum of the deviations from the mean, taken with their
proper signs, is zero.

This follows at once from equation (4): for if M and 4 are
identical, evidently =(7.£) must be zero,
        <pb n="135" />
        VIL.—AVERAGES. 115

(8) If a series of IV observations of a variable X consist of, say,
two component series, the mean of the whole series can be
readily expressed in terms of the means of the two components.
For if we denote the values in the first series by X; and in the
second series by X,,

3(X) = 3(X) + 3(Xy),
that is, if there be NV, observations in the first series and %, in
the second, and the means of the two series be M/;, J, respectively,
NM=N.M +N, M, . (5)
For example, we find from the data of Table VI., Chap. VI,
Mean stature of the 346 men born in Ireland =67-78 in.
gy 2 2 741 % y Wales=6662 in.
Hence the mean stature of the 1087 men born in the two countries
is given by the equation—
1087. M = (346 x 67-78) + (741 x 66-62).
That is, #/=66'99 inches. It is evident that the form of the
relation (5) is quite general : if there are » series of observations
xX, X, .... KX, the mean M of the whole series is related to
the means M;, M, ... . M, of the component series by the
equation
NM=N.M, +N. M+ .... +N. M, . (6)
For the convenient checking of arithmetic, it is useful to note
that, if the same arbitrary origin 4 for the deviations ¢ be taken
in each case, we must have, denoting the component series by the
subscripts 1, 2, . . . r as before,
(fH) =3(fp&amp;)+3(fpb)+ . .. - +3(ME) (7)
The agreement of these totals accordingly checks the work.

As an important corollary to the general relation (6), it may
be noted that the approximate value for the mean obtained from
any frequency distribution is the same whether we assume (1)
that all the values in any class are identical with the mid-value
of the class-interval, or (2) that the mean of the values in the
class is identical with the mid-value of the class-interval.

(¢) The mean of all the sums or differences of corresponding
observations in two series (of equal numbers of observations) is
equal to the sum or difference of the means of the two series.

This follows almost at once. For if

X=X, +X,
3X) =23(X)) + 3(X,)

a
        <pb n="136" />
        132 THEORY OF STATISTICS.
That is, if M, M,, M, be the respective means,
M=0 5, (8)

Evidently the form of this result is again quite general, so that
if

X=X +X + v hell ails +X,

M=M, +t HY, +L... =H or {9
As a useful illustration of equation (8), consider the case of
measurements of any kind that are subject (as indeed all
measures must be) to greater or less errors. The actual measure-
ment X in any such case is the algebraic sum of the true
measurement X; and an error X,. The mean of the actual
measurements M is therefore the sum of the true mean M, and
the arithmetic mean of the errors M, If, and only if, the
latter be zero, will the observed mean be identical with the true
mean. Errors of grouping (§ 11) are a case in point.

14. The median.—The median may be defined as the middle-
most or central value of the variable when the values are ranged
in order of magnitude, or as the value such that greater and
smaller values occur with equal frequency. In the case of a
frequency-curve, the median may be defined as that value of the
variable the vertical through which divides the area of the curve
into two equal parts, as the vertical through M7 in fig. 22.

The median, like the mean, fulfils the conditions (4) and (c)
of § 4, seeing that it is based on all the observations made, and
that it possesses the simple property of being the central or
middlemost value, so that its nature is obvious. But the defini-
tion does not necessarily lead in all cases to a determinate value.
If there be an odd number of different values of X observed, say
2n+1, the (n+ 1)th in order of magnitude is the only value
fulfilling the definition. But if there be an even number, say
2n different values, any value between the mth and (n+ 1)th
fulfils the conditions. In such a case it appears to be usual to
take the mean of the nth and (n+ 1)th values as the median,
but this is a convention supplementary to the definition. It
should also be noted that in the case of a discontinuous variable
the second form of the definition in general breaks down: if we
range the values in order there is always a middlemost value
(provided the number of observations be odd), but there is not, as a
rule, any value such that greater and less values occur with equal
frequency. Thus in Table IIL, § 3 of Chap. VI, we see that 45 per
cent. of the poppy capsules had 12 or fewer stigmatic rays, 55
per cent. had 13 or more ; similarly 61 per cent. had 13 or fewer
rays, 39 per cent. had 14 or more. There is no number of rays

"1
I)
        <pb n="137" />
        VIL.—AVERAGES. 117
such that the frequencies in excess and defect are equal.
In the case of the buttercups of Table XIV. (Chap. VI. § 15)
there is no number of petals that even remotely fulfils the
required condition. An analogous difficulty may arise, it may
be remarked, even in the case of an odd number of observations
of a continuous variable if the number of observations be small
and several of the observed values identical. The median is
therefore a form of average of most uncertain meaning in cases
of strictly discontinuous variation, for it may be exceeded by
5, 10, 15, or 20 per cent. only of the observed values, instead of
by 50 per cent.: its use in such cases is to be deprecated, and
is perhaps best avoided in any case, whether the variation be
continuous or discontinuous, in which small series of observations
have to be dealt with.

15. When a table showing the frequency-distribution for a
long series of observations of a continuous variable is given, no
difficulty arises, as a sufficiently approximate value of the median
can be readily determined by simple interpolation on the hypo-
thesis that the values in each class are uniformly distributed
throughout the interval. Thus, taking the figures in our first
illustration of the method of calculating the mean, the total
number of observations (registration districts) is 632, of which
the half is 316. Looking down the table, we see that there are
227 districts with not more than 2-75 per cent. of the population
in receipt of relief, and 100 more with between 2-75 and 3-25
per cent. But only 89 are required to make up the total of 316 :
bence the value of the median is taken as

276 + on. §=2T5 + 0-445
+100 2= +
=3'195 per cent.

The mean being 3:29, the median is slightly less ; its position
is indicated by M7 in fig. 21.

The value of the median stature of males may be similarly
calculated from the data of the second illustration. The work
may be indicated thus: —

Half the total number of observations (8585) =4292'5
Total frequency under 661% inches . . =3589
Difference . = 7035
Frequency in next interval . =1329
a  Miguniass
Therefore median = 6615 + 1399
= 6747 inches.
        <pb n="138" />
        1 THEORY OF STATISTICS.

The difference between median and mean in this case is
therefore only about one-hundredth of an inch, the smallness
of the difference arising from the approximate symmetry of
the distribution. In an absolutely symmetrical distribution
it is evident that mean and median must coincide.

16. Graphical interpolation may, if desired, be substituted
for arithmetical interpolation. Taking, again, the figures of
Example i., the number of districts with pauperism not exceeding
2:25 is 138 ; not exceeding 2:75, 227 ; not exceeding 3:25, 327 ;
and not exceeding 3'75, 417. Plot the numbers of districts
with pauperism not exceeding each value X to the corresponding

Sh =
% 400 - 200
3 :
3
3 376
8 JC —300
as
aS
3
Ho
a 200- - 200
08
§ 100~~ —+}- 100
2 3 5 2
Percentage ¢- _ . ropulation
wv receipt of relief
F1a. 28. —Determination of the median by graphical interpolation.
value of X on squared paper, to a good large scale, as in fig. 23,
and draw a smooth curve through the points thus obtained,
preferably with the aid of one of the “curves,” splines, or flexible
curves sold by instrument-makers for the purpose. The point
in which the smooth curve so obtained cuts the horizontal line
corresponding to a total frequency N/2=316 gives the median.
In general the curve is so flat that the value obtained by this
graphical method does not differ appreciably from that calculated
arithmetically (the arithmetical process assuming that the
curve is a straight line between the points on either side of
the median); if the curvature is considerable, the graphical
value—assuming, of course, careful and accurate draughtsmanship
—is to be preferred to the arithmetical value, as it does not

y,
        <pb n="139" />
        VIL—AVERAGES. 119
involve the crude assumption that the frequency is uniformly
distributed over the interval in which the median lies.

17. A comparison of the calculations for the mean and
for the median respectively will show that on the score of
brevity of calculation the median has a distinct advantage.
When, however, the ease of algebraical treatment of the two
forms of average is compared, the superiority lies wholly on
the side of the mean. As was shown in § 13, when several series
of observations are combined into a single series, the mean of
the resultant distribution can be simply expressed in terms
of the means of the components. The expression of the
median of the resultant distribution in terms of the medians
of the components is, however, not merely complex and difficult,
but impossible: the value of the resultant median depends on
the forms of the component distributions, and not on their
medians alone. If two symmetrical distributions of the same
form and with the same numbers of observations, but with
different medians, be combined, the resultant median must
evidently (from symmetry) coincide with the resultant mean, z.e.
lie halfway between the means of the components. But if the
two components be asymmetrical, or (whatever their form)
if the degrees of dispersion or numbers of observations in the
two series be different, the resultant median will not coincide
with the resultant mean, nor with any other simply assignable
value. It is impossible, therefore, to give any theorem for
medians analogous to equations (5) and (6) for means. It is
equally impossible to give any theorem analogous to equations
(8) and (9) of § 13. The median of the sum or difference of
pairs of corresponding observations in two series is not,
in general, equal to the sum or difference of the medians of
the two series ; the median value of a measurement subject to
error is not necessarily identical with the true median, even
if the median error be zero, ¢.e. if positive and negative errors
be equally frequent.

18. These limitations render the applications of the median in
any work in which theoretical considerations are necessary com-
paratively circumscribed. On the other hand, the median may
have an advantage over the mean for special reasons. (a) It is
very readily calculated ; a factor to which, however, as already
stated, too much weight ought not to be attached. (&amp;) It is
readily obtained, without the necessity of measuring all the
objects to be observed, in any case in which they can be arranged
by eye in order of magnitude. If, for instance, a number of men
be ranked in order of stature, the stature of the middlemost is
the median, and he alone need be measured. (On the other hand
        <pb n="140" />
        120 THEORY OF STATISTICS.

it is useless in the cases cited at the end of § 6 ; the median wage
cannot be found from the total of the wages-bill, and the total
of the wages-bill is not known when the median is given.) (c) It
is sometimes useful as a makeshift, when the observations are so
given that the calculation of the mean is impossible, owing, e.g., to
a final indefinite class, as in Table IV. (Chap. VI. § 10). (d) The
median may sometimes be preferable to the mean, owing to its
being less affected by abnormally large or small values of the
variable. The stature of a giant would have no more influence
on the median stature of a number of men than the stature of
any other man whose height is only just greater than the median.
If a number of men enjoy incomes closely clustering round a
median of £500 a year, the median will be no more affected by
the addition to the group of a man with the income of £50,000
than by the addition of a man with an income of £5000, or even
£600. If observations of any kind are liable to present occasional
greatly outlying values of this sort (whether real, or due to
errors or blunders), the median will be more stable and less
affected by fluctuations of sampling than the arithmetic mean.
(In general the mean is the less affected.) The point is discussed
more fully later (Chap. XVIL). (e¢) It may be added that the
median is, in a certain sense, a particularly real and natural
form of average, for the object or individual that is the median
object or individual on any one system of measuring the character
with which we are concerned will remain the median on any
other method of measurement which leaves the objects in the
same relative order. Thus a batch of eggs representing eggs
of the median price, when prices are reckoned at so much per
dozen, will remain a batch representing the median price when
prices are reckoned at so many eggs to the shilling.

19. The Mode.—The mode is the value of the variable corre-
sponding to the maximum of the ideal frequency-curve which
gives the closest possible fit to the actual distribution.

It is evident that in an ideal symmetrical distribution mean,
median and mode coincide with the centre of symmetry. If,
however, the distribution be asymmetrical, as in fig. 22, the three
forms of average are distinct, #o being the mode, 4/7 the median,
and JM the mean. Clearly, the mode is an important form of
average in the cases of skew distributions, though the term is of
recent introduction (Pearson, ref. 11). It represents the value
which is most frequent or typical, the value which is in fact the
fashion (la mode). But a difficulty at once arises on attempting
to determine this value for such distributions as occur in practice.
It is no use giving merely the mid-value of the class-interval into
which the greatest frequency falls, for this is entirely dependent
        <pb n="141" />
        VIL.—AVERAGES. 121
on the choice of the scale of class-intervals. It is no use making
the class-intervals very small to avoid error on that account, for
the class-frequencies will then become small and the distribution
irregular. What we want to arrive at is the mid-value of the
interval for which the frequency would be a maximum, if the
intervals could be made indefinitely small and at the same time
the number of observations be so increased that the class-frequen-
cies should run smoothly. As the observations cannot, in a
practical case, be indefinitely increased, it is evident that some
process of smoothing out the irregularities that occur in the
actual distribution must be adopted, in order to ascertain the
approximate value of the mode. But there is only one smoothing
process that is really satisfactory, in so far as every observation
can be taken into account in the determination, and that is the
method of fitting an ideal frequency-curve of given equation to
the actual figures. The value of the variable corresponding to the
maximum of the fitted curve is then taken as the mode, in
accordance with our definition. fo in fig. 21 is the value of the
mode so determined for the distribution of pauperism, the value
2:99 being, as it happens, very nearly coincident with the centre
of the interval in which the greatest frequency lies. The deter-
mination of the mode by this—the only strictly satisfactory—
method must, however, be left to the more advanced student.

20. At the same time there is an approximate relation between
mean, median, and mode that appears to hold good with surprising
closeness for moderately asymmetrical distributions, approaching
the ideal type of fig. 9, and it is one that should be borne in
mind as giving—roughly, at all events—the relative values of
these three averages for a great many cases with which the
student will have to deal. It is expressed by the equation—

Mode = Mean — 3(Mean — Median).

That is to say, the median lies one-third of the distance from the
mean towards the mode (compare figs. 21 and 22). For the dis-
tribution of pauperism we have, taking the mean to three places of
decimals,—

Mean . . 3289

Median 3-195

Difference 0-094

Hence approximate mode = 3:289 — 3 x 0-094
= 3-007,

or 3-01 to the second place of decimals, which is sufficient accuracy
for the final result, though three decimal places must be retained
for the calculation. The true mode, found by fitting an ideal
        <pb n="142" />
        123 THEORY OF STATISTICS.

distribution, is 2:99. As further illustrations of the closeness

with which the relation may be expected to hold in different cases,

we give below the results for the distributions of pauperism in

the unions of England and Wales in the years 1850, 1860, 1870,

1881, and 1891 (the last being the illustration taken above),

and also the results for the distribution of barometer heights at

Southampton (Table XI., Chap. VI. § 14), and similar distribu-

tions at four other stations.

Comparison of the Approximate and True Modes in the Case of Ive Dis-
tributions of Pauperism (Percentages of the Population in receipt of
Relief) in the Unions of England and Wales. (Yule, Jour. Roy. Stat.
Soc., vol. lix., 1896.)

Year. Mean. Median. Approximate guy, Mode,
Mode.

1850 6508 6°261 5767 5-815

1860 5:195 5000 4610 4:657

1870 5451 5:380 5238 | 5-038

1881 | 3676 3-523 3-217 3-240

1891 3289 3195 3-007 2987

Comparison of the Approximate and True Modes in the Case of Five Dis-
tributions of the Height of the Barometer for Daily Observations at the
Stations named. (Distributions given by Karl Pearson and Alice Lee,
Phil. Trans., A, vol. cxe. (1897), p. 423.)

Station. Mean. Median.  2P prowiinate True Mode.
Southampton . 29-981 30°000 30°038 30089
Londonderry . 29891 29915 29963 29960
Carmarthen . 29:952 29974 30018 30-013
Glasgow . : 29-886 29906 29-946 29967
Dundee . : 29-870 29°890 29930 29-951

It will be seen that in the case of the pauperism figures the
approximate mode only diverges markedly from the true value
in the year 1870, a year in which the frequency-distribution was
very irregular. In all the other years the difference between the
true and approximate values of the mode is hardly greater than
the alteration that might be caused in the true mode itself by
slight variations in the method of fitting the curve to the actual
distribution. Similar remarks apply to the second series of illus-
trations ; the true and approximate values are extremely close,
except in the case of Dundee and Glasgow, where the divergence
reaches two-hundredths of an inch.

21. Summing up the preceding paragraphs, we may say that
the mean is the form of average to use for all general purposes;

6
oA
        <pb n="143" />
        VIL—AVERAGES. tJ
it is simply calculated, its value is always determinate, its
algebraic treatment is particularly easy, and in most cases it is
rather less affected than the median by errors of sampling. The
median is, it is true, somewhat more easily calculated from a given
frequency-distribution than is the mean ; it is sometimes a useful
makeshift, and in a certain class of cases it is more and not less
stable than the mean ; but its use is undesirable in cases of discon-
tinuous variation, its value may be indeterminate, and its algebraic
treatment is difficult and often impossible. The mode, finally,
is a form of average hardly suitable for elementary use, owing
to the difficulty of its determination, but at the same time it
represents an important value of the variable. The arithmetic
mean should invariably be employed unless there is some very
definite reason for the choice of another form of average, and the
elementary student will do very well if he limits himself to its
use. Objection is sometimes taken to the use of the mean in the
case of asymmetrical frequency-distributions, on the ground that
the mean is not the mode, and that its value is consequently
misleading. But no one in the least degree familiar with the
manifold forms taken by frequency-distributions would regard the
two as in general identical ; and while the importance of the mode
is a good reason for stating its value in addition to that of the
mean, it cannot replace the latter. The objection, it may be noted,
would apply with almost equal force to the median, for, as we have
seen (§ 20), the difference between mode and median is usually
about two-thirds of the difference between mode and mean.

22. The Geometric Mean.—The geometric mean @ of a series of
values X,, Xy, X;, . . . . X,, is defined by the relation

EE NTE R . (10)
The definition may also be expressed in terms of logarithms,
log @= 1 3(log X) . (11)
N
that is to say, the logarithm of the geometric mean of a series of
values is the arithmetic mean of their logarithms.

The geometric mean of a given series of quantities is always
less than their arithmetic mean ; the student will find a proof in
most text-books of algebra, and in ref. 10. The magnitude of
the difference depends largely on the amount of dispersion of the
variable in proportion to the magnitude of the mean (cf. Chap.
VIII, Question 8). Itis necessarily zero, it should be noticed, if
even a single value of X is zero, and it may become imaginary if
negative values occur. Excluding these cases, the value of the

125%
        <pb n="144" />
        124 THEORY OF STATISTICS.

geometric mean is always determinate and is rigidly defined. The
computation is a little long, owing to the necessity of taking
logarithms: it is hardly necessary to give an example, as the
method is simply that of finding the arithmetic mean of the
logarithms of X (instead of the values of X) in accordance with
equation (11). If there are many observations, a table should be
drawn up giving the frequency-distribution of log X, and the
mean should be calculated as in Examples i. and ii. of § 9 and 10.
The geometric mean has never come into general use as a repre-
sentative average, partly, no doubt, on account of its rather
troublesome computation, but principally on account of its some-
what abstract mathematical character (cf. § 4 (c)): the geometric
mean does not possess any simple and obvious properties which
render its general nature readily comprehensible.

23. At the same time, as the following examples show, the
mean possesses some important properties, and is readily treated
algebraically in certain cases.

(a) If the series of observations X consist of » component
series, there being IV, observations in the first, &amp;V, in the second,
and so on, the geometric mean G of the whole series can be
readily expressed in terms of the geometric means @;, G@,, etc., of
the component series. For evidently we have at once (as in § 13
(®))—

N.logG=0N,log G+ Ny, logG+ .... +N, log@,. . (12)

(6) The geometric mean of the ratios of corresponding observa-
tions in two series is equal to the ratio of their geometric means.
For if

X= 1/ Xo,
log X =log X; —log X,,
then summing for all pairs of X;’s and X's,
G=0G,/G, : --9(13)

(c) Similarly, if a variable X is given as the product of any
number of others, z.e. if

X= Xa vid. 20,
xX, X, .... JX, denoting corresponding observations in 7
different series, the geometric mean G' of X is expressed in terms
of the geometric means @;, Gi. =. 0. Gof X,, X,, . ..*. X,, by
the relation

= 0, CN 0, . (14)
That is to say, the geometric mean of the product is the product
of the geometric means.

sg
        <pb n="145" />
        VIL—AVERAGES. 125
24. The use of the geometric mean finds its simplest application
in estimating the numbers of a population midway between two
epochs (say two census years) at which the population is known.
If nothing is known concerning the increase of the population
save that the numbers recorded at the first census were P, and at
the second census n years later P,, the most reasonable assump-
an is gil 2% F719] 1901
300 ‘1-300
Qunbdertand
2 — 250
Dorset
Dag —- 200
150 - = 150
Cumberland
Dorset - Hereford
00 - —100
Hereford
os iC 2
o. eS O
. ‘ : I ‘
JECT Lo oi =r $i, S81 el. qt. 81 91. i904
Census year.
F16. 24.— Showing the Populations of certain rural counties of England
for each Census year from 1801 to 1901.
tion to make is that the percentage increase in each year has
been the same, so that the populations in successive years form a
geometric series, Pr being the population a year after the first
census, £;r* two years after the first census, and so on, and
P, =P. 7” . (15)
The population midway between the two censuses is therefore
Pop=Pyr"?=(P,.P,)} 7m

5

o¢

50 Sr
(lo
        <pb n="146" />
        i= THEORY OF STATISTICS.
i.e. the geometric mean of the numbers given by the two censuses.
This result must, however, be used with discretion. The rate of
increase of population is not necessarily, or even usually, constant
over any considerable period of time: if it were so, a curve
representing the growth of population as in fig. 24 would be
continuously convex to the base, whether the population were
increasing or decreasing. In the diagram it will be seen that
the curves are frequently concave towards the base, and similar
results will often be found for districts in which the population is
not increasing very rapidly, and from which there is much
emigration. Further, the assumption is not self-consistent in any
case in which the rate of increase is not uniform over the entire
area—and almost any area can be analysed into parts which are not
similar in this respect. For if in one part of the area considered
the initial population is P, and the common ratio R, and in the
remainder of the area the initial population is p, and the common
ratio r, the population in year = is given by
Pt p,= Po B+ por”.

This does not represent a constant rate of increase unless B=.
If then, for example, a constant percentage rate of increase be
assumed for England and Wales as a whole, it cannot be assumed
for the Counties: if it be assumed for the Counties, it cannot be
assumed for the country as a whole. The student is referred to
refs. 14, 15 for a discussion of methods that may be used for the
consistent estimation of populations under such circumstances.
- 25. The property of the geometric mean illustrated by equation
(13) renders it, in some respects, a peculiarly convenient form of
average in dealing with ratios, 7.e. “index-numbers,” as they are
termed, of prices. Let

Kon wo i val

EX i te

Xn )
denote the prices of &amp;/ commodities in the years 0,1, 2 . . .
Further, let ¥,,=X,/X,, and so on, so that

Y' 10 Yi Y 4 dele TT,

Yop Yop Yop + + + + Yn
represent the ratios of the prices of the several commodities in years
1, 2, . . . to their prices in year 0. These ratios, in practice
multiplied by 100, are termed index-numbers of the prices of the
several commodities, on the year 0 as base. Evidently some

+26
        <pb n="147" />
        VIL.—AVERAGES. 127
form of average of the Y’s for any given year will afford an
indication of the general level of prices for that year, provided the
commodities chosen are sufficiently numerous and representative.
The question is, what form of average to choose. If the geometric
mean be chosen, and &amp;,,, G,, denote the geometric means of the
Y’s for the years 7 and 2 respectively, we have

Go_(T Tn I's i
Gy \Y') YI ;
C5 2: SN
ARR Se A LL
- (Fy 7 BE)
From the first form of this equation we see that the ratio of the
geometric mean index-number in year 2 to that in year 7 is
identical with the geometric mean of the ratios for the index-
numbers of the several commodities. A similar property does
not hold for any other form of average : the ratio of the arithmetic
mean index-numbers is not the same as the arithmetic mean of
the ratios, nor is the ratio of the medians the median of the
ratios. From the second and third forms of the equation it
appears further that the ratio of the geometric mean index-
number in year 2 to that in year 7 is independent of the prices in
the year first chosen as base (i.e. year 0), and is identical with the
geometric mean of the index-numbers for year 2, on year 7 as
base. Again, a similar property does not hold for any other form
of average. If arithmetic means of the index-numbers be taken,
for example, the ratio of the mean in year 2 to the mean in year
1 will vary with the year taken as base, and will differ more or
less from the arithmetic mean ratio of the prices in year 2 to the
prices of the same commodities in year 7 ; the same statement is
true if medians be used. The results given by the use of the
geometric mean possess, therefore, a certain consistency that is
not exhibited if other forms of average are employed. It was
used in a classical paper by Jevons (ref. 4), though not on quite
the same grounds, but has never been at all generally employed.
26. The general use of the geometric mean has been suggested
on another ground, namely, that the magnitudes of deviations
appear, as a rule, to be dependent in some degree on the magni-
tude of the average; thus the length of a mouse varies less than
the stature of a man, and the height of a shrub less than that of
a tree. Hence, it is argued, variations in such cases should be
measured rather by their ratio to, than their difference from, the
average ; and if this is done, the geometric mean is the natural
average to use. If deviations be measured in this way, a
        <pb n="148" />
        He THEORY OF STATISTICS.

deviation @/r will be regarded as the equivalent of a deviation .@,
instead of a deviation —a as the equivalent of a deviation +=.
If a distribution take the simplest possible form when relative
deviations are regarded as equivalents, the frequency of deviations
between @/s and G/r will be equal to the frequency of deviations
between 7.G and s.@. The frequency-curve will then be sym-
metrical round log @ if plotted to log X as base, and if there be
a single mode, log @ will be that mode—a logarithmic or geometric
mode, as it might be termed : @ will not be the mode if the distri-
bution be plotted in the ordinary way to values of X as base.
The theory of such a distribution has been discussed by more than
one author (refs. 2, 8,9). The general applicability of the assump-
tion made does not, however, appear to have been very widely
tested, and the reasons assigned have not sufficed to bring the
geometric mean into common use. It may be noted that, as the
geometric mean is always less than the arithmetic mean, the
fundamental assumption which would justify the use of the former
clearly does not hold where the (arithmetic) mode is greater than
the arithmetic mean, as in Tables X. and XI. of the last chapter.

97. The Harmonic Mean.—The harmonic mean of a series of
quantities is the reciprocal of the arithmetic mean of their
reciprocals, that is, if A be the harmonic mean,

LL

1-131) SE = 1%)
The following illustration, the result of which is required for an
example in a later chapter (Chap. XIIL § 11), will serve to show
the method of calculation.

The table gives the number of litters of mice, in certain
breeding experiments, with given numbers (X) in the litter. (Data
from A. D. Darbishire, Biometrika, iii. pp. 30, 31.)

Number in | Number of
Litter. Litters. f1X.
xX. 7:

7 7:000

1 5:500

16 5333

17 4250

26 | 5200

31 57167

11 1-571

v 0125

0-111

i 84207

.28
qn
        <pb n="149" />
        VIL—AVERAGES, 129
Whence, 1/7 =02831, //=3'532. The arithmetic mean is 4'587,
or more than a unit greater,

If the prices of a commodity at different places or times are
stated in the form “so much for a unit of money,” and an average
price obtained by taking the arithmetic mean of the quantities
sold for a unit of money, the result is equivalent to the harmonic
mean of prices stated in the ordinary way. Thus retail prices of
eggs were quoted before the War as “so many to the shilling.”
Supposing we had 100 returns of retail prices of eggs, 50 returns
showing twelve eggs to the shilling, 30 fourteen to the shilling,
and 20 ten to the shilling; then the mean number per shilling
would be 12:2, equivalent to a price of 0-984d. per egg. But
if the prices had been quoted in the form usual for other com-
modities, we should have had 50 returns showing a price of 1d.
per egg, 30 showing a price of 0-857d., and 20 a price of 1-2d.:
arithmetic mean 0'997d., a slightly greater value than the har-
monic mean of 0°984. The official returns of prices in India were,
until 1907, given in the form of “Sers (2:057 Ibs.) per rupee.”
The average annual price of a commodity was based on half-
monthly prices stated in this form, and “index-numbers” were
calculated from such annual averages. In the issues of Prices
and Wages in India” for 1908 and later years the prices have
been stated in terms of “rupees per maund (82286 1bs.).” The
change, it will be seen, amounts to a replacement of the harmonic
by the arithmetic mean price.

The harmonic mean of a series of quantities is always lower
than the geometric mean of the same quantities, and, &amp; fortior,
lower than the arithmetic mean, the amount of difference depend-
ing largely on the magnitude of the dispersion relatively to the
magnitude of the mean. (Cf. Question 9, Chap. VIIL)

REFERENCES.
General.

(1) FEcENER, G. T. “Ueber den Ausgangswerth der kleinsten Abweich-
ungssumme, dessen Bestimmung, Verwendung und Verallgemein-
erung,” Abh. d. kgl. sdchsischen Gesellschaft d. Wissenschaften, vol.
xviii, (also numbered xi. of the 4Abk. d. math.-phys. Classe); Leipzig
(1878), p. 1. (The average defined as the origin from which the
dispersion, measured in one way or another, is a minimum : geometric
mean dealt with incidentally, pp. 13-16.)

(2) FECHNER, G. T., Kollektivmasslehre, herausgegeben von G. F. Lipps;
Engelmann, Leipzig, 1897. (Posthumously published: deals with
frequency-distributions, their forms, averages, and measures of dis-
persion in general : includes much of the matter of (1).)

(8) Z1zER, FRANZ, Die statistischen Mittelwerthe; Dunckerund Humblot, Leipzig,
1908 : English translation, Statistical Averages, translated with addi-
tional notes, etc., by W. M. Persons, Holt &amp; Co., New York,1913. (Non-
mathematical, but useful to the economic student for references cited.)

J
        <pb n="150" />
        THEORY OF STATISTICS.
The Geometric Mean.

(4) Jevons, W. STANLEY, 4 Serious Fall in the Value of Gold ascertained
and its Social Effects set forth ; Stanford, London, 1863. Reprinted
in Investigations in Currency and Finance ; Macmillan, London, 1884.
(The geometric mean applied to the measurement of price changes. )

(5) JEVoNs, W. STANLEY, ‘On the Variation of Prices and the Value of
the Currency since 1782,” Jour. Roy. Stat. Soc., vol. xxviii., 1865.
Also reprinted in volume cited above.

(6) EpcEworrH, F. Y., “On the Method of ascertaining a Change in the
Value of Gold,” Jour. Roy. Stat. Soc., vol. xlvi., 1883, p. 714. (Some
a of the reasons assigned by Jevons for the use of the geometric
mean,

(7) GALTON, FRANCIS, ‘The Geometric Mean in Vital and Social Statistics,”
Proc. Roy. Soc., vol. xxix., 1879, p. 365.

(8) MCALISTER, DoNALD, ‘The Law of the Geometric Mean,” ¢bid., p. 367.
(The law of frequency to which the use of the geometric mean would
be appropriate.)

(9) KarreyN, J. C., Skew Frequency-curves in Biology and Statistics ;
Noordhoff, Groningen, and Wm. Dawson, London, 1903. (Contains,
amongst other forms, a generalisation of McAlister’s law.)

(10) CrawFrorD, G. E., ‘“ An Elementary Proof that the Arithmetic Mean
of any number of Positive Quantities is greater than the Geometric
Mean,” Proc. Edin. Math. Soc., vol. xviii., 1899-1900.

See also refs. 1 and 2.

The Mode.

(11) PrArsoN, KARL, ‘‘Skew Variation in Homogeneous Material,” Phil.
Trans. Roy. Soc., Series A, vol. clxxxvi., 1895, p. 843. (Definition of
mode, p. 345.)

(12) YuLg, G. U., ‘“Notes on the History of Pauperism in England and
Wales, ete. : Supplementary Note on the Determination of the Mode,”
Jour. Roy. Stat. Soc., vol. lix., 1896, p. 848. (The note deals with
elementary methods of approximately determining the mode : the one-
third rule and one other.)

(18) PEARSON, KARL, ‘On the Modal Value of an Organ or Character,”
Biometrika, vol. i., 1902, p. 260. (A warning as to the inadequacy of
mere inspection for determining the mode.)

Estimates of Population.

(14) WaArErs, A. C., “A Method for estimating Mean Populations in the
last Intercensal Period,” Jour. Roy. Stat. Soc., vol. lxiv., 1901, p. 293.

(15) Warkrs, A. C., Estimates of Population : Supplement to Annual Report of
the Registrar-General for England and Wales (Cd. 2618, 1907, p. cxvii.)

For the methods actually used, see the Reports of the Registrar-General
of England and Wales for 1907, pp. cxxxii-cxxxiv, and for 1910,
pp. xi-xil. Cf. SNxow, ref. 11, Chap. XII, for a different method
based on the symptoms of growth such as numbers of births or of houses.

Index-numbers.

These were incidentally referred to in § 25. The general theory of
index-numbers and the different methods in which they may be formed
are not considered in the present work. The student will find copious
references to the literature in the following :—

(16) EpcEWORTH, E. Y., “Reports of the Committee appointed for the

130
        <pb n="151" />
        VIL—AVERAGES, 1
purpose of investigating the best methods of ascertaining and measuring
Variations in the Value of the Monetary Standard,” British Association
Beports, 1887 (p, 247), 1888 (p. 181), 1859 (p. 133), and 1890 (p. 485).

(17) EvcrwortH, F. Y., Article ‘‘ Index-numbers” in Palgrave’s Dictionary
of Political Economy, vol. ii.; Macmillan, 1896.

(18) Founralx, H., “Memorandum on the Construction of Index-numbers
of Prices,” in the Board of Trade Report on Wholesale and Retail
Prices in the United Kingdom, 1903.

EXERCISES.

1. Verify the following means and medians from the data of Table YI.,
Chap. VI., p. 88.

Stature in Inches for Adult Males in—

England. Scotland. Wales. Ireland.
Mean : . 67-31 68°55 66°62 6778
Median . : . 67-35 68°48 66°56 67°69

In the calculation of the means, use the same arbitrary origin as in Example

ii., and check your work by the method of § 13 (d).

2. Find the mean weight of adult males in the United Kingdom from the
data in the last column of Table IX., Chap VI, p- 95. Also find the median
weight, and hence the approximate mode, by the method of § 20.

3. Similarly, find the mean, median, and approximate value of the mode
for the distribution of fecundity in race-horses, Table X., Chap. VL, p. 96.

4. Using a graphical method, find the median annual value of houses
assessed to inhabited house duty in the financial year 1885-6 from the data
of Table IV., Chap. VL., p. 83.

5. (Data from Sauerbeck, Jour. Roy. Stat. Soc., March 1909.) The figures
in columns 1 and 2 of the small table below show the index-numbers (or per-
centages) of prices of certain animal foods in the years 1898 and 1908, on
their average prices during the years 1867-77. In column 3 have been added
the ratios of the index-numbers in 1908 to the index-numbers in 1898, the
latter being taken as 100.

Find the average ratio of prices in 1908 to prices in 1898, taken as 100 :—

(1) From the arithmetic mean of the ratios in col, 8.

(2) From the ratio of the arithmetic means of cols, 1 and 2,

(3) From the ratio of the geometric means of cols. 1 and 2,

(4) From the geometric mean of the ratios in col. 3.

Note that, by § 25, the last two methods must give the same result,

Index- number of price in Ratio
Commodity. 1898. ns, 08/98.
1. 2, 3.
*. Beef, prime . : ' 78 88 112-8
Beef, middling . ; . 72 | 90 1250

. Mutton, prime . : . 84 92 1095

“» Mutton, middling . CC 9 141-8

. Pork . : ; : &amp; 954

. Bacon me 1077
/. Butter 1197

13:
19u
        <pb n="152" />
        1: THEORY OF STATISTICS.

6. (Data from census of 1901.) The table below shows the population of
the rural sanitary districts of Essex, the urban sanitary districts (other than
the borough of West Ham), and the borough of West Ham, at the censuses
of 1891 and 1901. Estimate the total population of the county at a date
midway between the two censuses, (1) on the assumption that the percentage
rate of increase is constant for the county as a whole, (2) on the assumption
that the percentage rate of increase is constant in each group of districts and
the borough of West Ham.

Population.
Es&lt;ex, 5
1891. 1901.
Rural districts . 232,867 240,776
West Ham . ; . . 204,903 267,358
Other urban districts . 345,604 575,864
Total 783,374 1,083,998

7. (Data from Agricultural Statistics for 1905, Cd. 8061, 1906.) The
following statement shows the monthly average prices of eggs in Great
Britain in 1905, as compiled from the weekly returns of market prices for
first and second quality British eggs, per 120 :—

First Second
Benth. Quality. Quality.
snd. snd.
January . &gt; . 13 Opry 11 0
February . . . 110: 90
March . ; ; : 890 Bao
Aprilia. 3 ; . 7 13
Mays =, L a 2 i
June . : . oe To
; July . . . i .
August . 3 . IE 10 9
September . : TIES ERIE 1048
October . 2 : . 14 0 12:3
November . ‘ : 18 0 16 0
December . : 1756 15 0
Mean for year 11 5% | 10 0%
What would have been the mean price for the year in each case if the whole-
sale prices had been recorded in the same way as retail prices, 7.c. at so many
eggs per shilling ? State your answer in the form of the equivalent price per
120, and obtain it in the shortest way by taking the harmonic mean of the
above prices (¢f. § 27). :

8. Supposing the frequencies of values 0, 1, 2, . . . of a variable to be

given by the terms of the binomial series
n(n —1)
a n.qr=1,p, w= Dp p8 so 6 8
where p+¢=1, find the mean.

32
        <pb n="153" />
        CHAPTER VIIL
MEASURES OF DISPERSION, ETC.

1. Inadequacy of the range as a measure of dispersion—2-13. The standard
deviation : its definition, calculation, and properties—14-19. The
mean deviation : its definition, calculation, and properties—20-24, The
quartile deviation or semi-interquartile range—25. Measures of
relative dispersion—26. Measures of asymmetry or skewness—27-30.
The method of grades or percentiles.

1. THE simplest possible measure of the dispersion of a series of

values of a variable is the actual range, i.e. the difference between

the greatest and least values observed. While this is frequently
quoted, it is as a rule the worst of all possible measures for any
serious purpose. There are seldom real upper and lower limits
to the possible values of the variable, very large or very small
values being only more or less infrequent : the range is therefore
subject to meaningless fluctuations of considerable magnitude
according as values of greater or less infrequency happen to
have been actually observed. Note, for instance, the figures of
Table IX., Chap. VL p. 95, showing the frequency distributions of
weights of adult males in the several parts of the United King-
dom. In Wales, one individual was observed with a weight of
over 280 lbs., the next heaviest being under 260 lbs. The
addition of the one very exceptional individual has increased th~
range by some 30 lbs., or about one-fifth. A measure subject to
erratic alterations by casual influences in this way is clearly not
of much use for comparative purposes. Moreover, the measure
takes no account of the form of the distribution within the limits
of the range; it might well happen that, of two distributions
covering precisely the same range of variation, the one showed
the observations for the most part closely clustered round the
average, while the other exhibited an almost even distribution of
frequency over the whole range. Clearly we should not regard
two such distributions as exhibiting the same dispersion, though
they exhibit the same range. Some sort of measure of dispersion
is therefore required, based, like the averages discussed in the last
129
        <pb n="154" />
        1° THEORY OF STATISTICS.

chapter, on all the observations made, so that no single observation
can have an unduly preponderant effect on its magnitude ; indeed,
the measure should possess all the properties laid down as desir-
able for an average in § 4 of Chap. VII. There are three such
measures in common use—the standard deviation, the mean
deviation, and the quartile deviation or semi-interquartile range,
of which the first is the most important.

2. The Standard Deviation.—The standard deviation is the
square root of the arithmetic mean of the squares of all deviations,
deviations being measured from the arithmetic mean of the
observations. If the standard deviation be denoted by o, and a
deviation from the arithmetic mean by z, as in the last chapter,
then the standard deviation is given by the equation

N70

of = 2(27) : : : a (TY
To square all the deviations may seem at first sight an artificial
procedure, but it must be remembered that it would be useless to
take the mere sum of the deviations, in order to obtain a measure
of dispersion, since this sum is necessarily zero if deviations be
taken from the mean. In order to obtain some quantity that
shall vary with the dispersion it is necessary to average the
deviations by a process that treats them as if they were all of the
same sign, and squaring is the simplest process for eliminating
signs which leads to results of algebraical convenience.

3. A quantity analogous to the standard deviation may be
defined in more general terms. Let 4 be any arbitrary value of
X, and let &amp; (as in Chap. VIL. § 8) denote the deviation of X
from 4 ; &lt;.e. let

E=X-4.
Then we may define the root-mean-square deviation s from the
origin 4 by the equation
1
See ro RUE, . 2
= 3(8) (2)
In terms of this definition the standard deviation is the root-
mean-square deviation from the mean. There is a very simple
relation between the standard deviation and the root-mean-square
deviation from any other origin. Let
M-4=d. ‘3)
so that E=x +d.
Then £2=0p2 + 2x.d + d?,
2(£2) = 3(«?) + 2d.3(x) + N.dA.

34
\*
        <pb n="155" />
        VIIL.—MEASURES OF DISPERSION, ETC. 135
But the sum of the deviations from the mean is zero, therefore
the second term vanishes, and accordingly
2=c2+d2. (4)
Hence the root-mean-square deviation is least when deviations
are measured from the mean, 7.e. the standard deviation is the least
possible root-mean-square deviation.

3(&amp;2), or 3(f.&amp;) if we are dealing with a grouped distribution
and f is the frequency of &amp; is sometimes termed the second moment
of the distribution about 4, just as 3(¢) or 3(f.§) is termed
the first moment (¢f. Chap. VII. § 8): we shall not make use
of the term in the present work. Generally, 3(f.£") is termed
the nth moment.

4. If o and d are the two sides of a right-angled triangle, s is

a——
Fic. 2..

the hypotenuse. If, then, #/H be the vertical through the
mean of a frequency-distribution (fig. 25), and AS be set off
equal to the standard deviation (on the same scale in which the
variable X is plotted along the base), S4 will be the root-mean-
square deviation from the point 4. This construction gives a
concrete idea of the way in which the root-mean-square deviation
depends on the origin from which deviations are measured. It
will be seen that for small values of d the difference of s from o
will be very minute, since 4 will lie very nearly on the circle
drawn through A/ with centre .S and radius SJ/: slight errors
in the mean due to approximations in calculation will not, there-
fore, appreciably affect the value of the standard deviation.

5. If we have to deal with relatively few, say thirty or forty,
ungrouped observations, the method of calculating the standard
deviation is perfectly straightforward. It is illustrated by the
figures given below for the estimated average earnings of
        <pb n="156" />
        HE THEORY OF STATISTICS.

agricultural labourers in 38 rural unions. The values (earnings)
are first of all totalled and the total divided by XN to give the
arithmetic mean M, viz. 15s. 1119d., or 15s. 11d. to the nearest
penny. The earnings being estimates, it is not necessary to take
the average to any higher degree of accuracy. Having found
the mean, the difference of each observation from the mean is
next written down as in col. 3, one penny being taken as the
unit : the signs are not entered, as they are not wanted, but the
work should be checked by totalling the positive and negative
differences separately. [The positive total is 300 and the
negative 290, thus checking the value for the mean, viz. 15s.
11d. +10/38.]

Finally, each difference is squared, and the squares entered in
col. 4,—tables of squares are useful for such work if any of the
differences to be squared are large (see list of Tables, p. 356).
The sum of the squares is 16,018. Treating the value taken for
the mean as sensibly accurate, we have—

16018
DL ESS .
Fix Sn =421'5
o = 205d.
If we wish to be more precise we can reduce to the true mean
by the use of equation (4), as follows :—
2 16,018 mis
Fe =4215263
10
d=—==02632; d2= 00693
38
Hence ol=s2—d?=4214570
o= 20'529d.
Evidently this reduction, in the given case, is unnecessary,
illustrating the fact mentioned at the end of § 4, that small
errors in the mean have little effect on the value found for the
standard deviation. The first value is correct within a very
small fraction of a penny.

50
        <pb n="157" />
        VIIL—MEASURES OF DISPERSION, ETC. 7
CALOULATION OF THE STANDARD DEVIATION: Example i.—Caleulation of
Mean and Standard Deviation for a Short Series of Observations un-
grouped. Estimated Average Weekly Earnings of Agricultural Labourers
in Thirty-eight Rural Unions, in 1892-3. (W. Little: Labour Com-
mission; Report, vol. v., part i., 1894.)
2. 2
Lion ne Difference (Difference)?
and Pence). § (Pence). £.
Ss. d;
1. Glendale . ; ’ ; 20 9 58 3,364
2. Wigton . ; 20 3 52 2,704
Garstang h 19. 2 | 45 2,025
. Belper . : 1? ‘ 961
. Nantwich . : 1. 441
. Atcham \ 17 ¥ 361
+8 Driffield . . 170 196
.. Uttoxeter . ‘ 175. 169
9, Wetherby . h 17 0 169
10. Easingwold . 16 11 144
11. Southwell . . ; 16. 6 4:
12, Hollingbourn . : 16 4 &lt;
13. Melton Mowbray . 16 = :
14. Truro : : . 16 ]
15. Godstone . . 8
16. Louth ’ ] . 13
17. Brixworth . . 3
18. Crediton |, . i)
19. Holbeach , \ 1) $y
20. Maldon \ 19 on
21. Monmouth : 1) J
22. St Neots . 15 ol
23. Swaffham |, : 15 1zi
24. Thakeham . . i5 121
25. Thame . ) 150 121
26. Thingoe . j 15 ¢ 121
27. Basingstoke . 1500 121
28. Cirencester : . 1576 121
29. N.Witchford . 14 169
30. Pewsey . . d 14% 196
31. Bromyard . : : 14 . 196
32, Wantago . : : 14 196
33. Stratford-on-Avon . 14 256
34. Dorchester . ; 14 3 289
35. Woburn : ’ 14 289
36. Buntingford . 1+ i 361
37. Pershore . . . 21 841
38. Langport . s iz 4 1.681
"oo, +300
: tl 605 3 l Eo + 16,018

13,
1. ” '
{Ti
TY 5s
        <pb n="158" />
        = THEORY OF STATISTICS.

The figures dealt with in this illustration are estimates of the
weekly earnings of the agricultural labourers, &lt;.e. they include
allowances for gifts in kind, such as coal, potatoes, cider, etc. The
estimated weekly money wages are, however, also given in the
same Report, and we are thus enabled to make an interesting
comparison of the dispersions of the two. It might be expected
that earnings would vary less than wages, as his earnings and not
the mere money wages he receives are the important matter to
the labourer, and as a fact we find

Standard deviation of weekly earnings . 20-5d.
T &gt; - wages . 260d.
The arithmetic mean wage is 13s. 5d.

6. If we have to deal with a grouped frequency-distribution,
the same artifices and approximations are used as in the calculation
of the mean (Chap. VIL §§ 8, 9, 10). The mid-value of one of
the class-intervals is chosen as the arbitrary origin 4 from which
to measure the deviations § the class-interval is treated as a
unit throughout the arithmetic, and all the observations within
any one class-interval are treated as if they were identical with
the mid-value of the interval. If, as before, we denote the
frequency in any one interval by f, these f observations con-
tribute f¢2 to the sum of the squares of deviations and we
have—

1
Boris 2
$2 = 7 ( 72
The standard deviation is then calculated from equation (4).

7. The whole of the work proceeds naturally as an extension of
that necessary for calculating the mean, and we accordingly use
the same illustrations as in the last chapter. Thus in Example
ii. below, cols. 1, 2, 3, and 4 are the same as those we have already
given in Example i. of Chap. VIL for the calculation of the mean,
Column 5 gives the figures necessary for calculating the standard
deviation, and is derived directly from col. 4 by multiplying the
figures of that column again by &amp; Thus 90 x 5= 450, 192 x 4=
768, and so on. The work is therefore done very rapidly. The
remaining steps of the arithmetic are given below the table ; the
student must be careful to remember the final conversion, if
necessary, from the class-interval as unit to the natural unit
of measurement. In this case the value found is 2:48 class
intervals, and the class-interval being half a unit, that is 1-24
per cent.

138
        <pb n="159" />
        VII.—MEASURES OF DISPERSION, ETC. 2
CALCULATION OF THE STANDARD DEVIATION: Erample ii. — Calculation of
the Standard Deviation of the Percentages of the Population in receipt of
Relief, in addition to the Mean, from the figures of Table VIII. of
Chap. VI. (Cf. the work for the mean alone, p. 111.)
(1) (?) (3) (4) (5)
Percentage Deviation
in receipt frengensy. from Value 4. Prodacs, Pe,
of Relief, , ¢. 2% :
1 18 - 5 90 450
1'5 48 - 4 192 | 768
2 72 - 3 216 648
2-5 89 - 2 178 356
3 100 = J 100 100
3-5 90 -776 —
4 75 75
45 120 240
120 360
5 84 336
55 275
vd 30 180
; 49
Po ; 64
2H J 100
Total +509 411]
From previous work, p. 111, M- 4 =d= 04225 class-intervals,
3(f%) 4001 _
= =55=6 3307.
o's l= 63307 - (4225)?
=6'1522.
«*. o =248 intervals =1'24 per cent.

To illustrate again the value of the standard deviation for
purposes of comparison, figures are given below showing the
means and standard deviations of similar distributions for a series
of years from 1850. It will be seen that not only did the mean
decrease during the period, but the standard deviation decreased
to an equally marked extent, having been halved between
1850 and 1891 ; the average was lowered, and at the same time
the percentages of the population in receipt of relief clustered
much more closely round the lower average.

139
¢
        <pb n="160" />
        140 THEORY OF STATISTICS.

Means and Standard Deviations of the Distributions of Pauperism (Percentage
of the Population in receipt of Poor-law Relief) in the Unions of England
and Wales since 1850. (From Yule, Jour. Roy. Stat. Soc., vol. lix.,
1896, figures slightly amended.)

Percentage of the Population
in receipt of Relief.
Foor.
Arithmetic Standard
Mean. Deviation.
1850 6°51 2:50
1860 5:20 2:07
1870 5°45 2:02
1881 3:68 1-36
1891 3°29 1:24

8. In the table given on p. 141 (Example iii.), the calculation of
the standard deviation is similarly shown for the distribution of
the statures of adult males in the British Isles, the work being
continued from the stage which it reached for the calculation of
the mean in Example ii. of Chap. VII. The steps of the arith-
metic hardly call for further explanation, but it may be noted that
the class-interval being a unit in this case, no conversion of
the standard deviation from class-intervals to units is required.

9. The student must remember, as in the case of the calculation
of the mean, that the treatment of all values within each class-
interval as if they were identical with the mid-value of the interval
is an approximation and no more (¢f. Chap. VII. § 11), though,
for a distribution of the symmetrical or moderately asymmetrical
type with a class-interval not greater than one-twentieth or so
of the range, the approximation may be a very close one. But
while the value of the arithmetic mean may be either increased
or decreased by grouping, in the case of distributions which are
not more than slightly asymmetrical, the standard deviation of
such distributions tends to be increased, and the increase is the
greater the cruder the grouping. We give an approximate
correction for this effect later (Chap. XI. § 4). The student is
recommended to test for himself the effect of grouping in two
or three cases.

10. Tt is a useful empirical rule to remember that a range of
six times the standard deviation usually includes 99 per cent. or
more of all the observations in the case of distributions of the
symmetrical or moderately asymmetrical type. Thus in Example

~ No
of
L Uwe
        <pb n="161" />
        VIII.—MEASURES OF DISPERSION, ETC. 141
CALCULATION OF THE STANDARD DEVIATION: Example iii.— Calculation
of the Standard Deviation of Stature of Male Adults in the British Isles
Jrom the figures of Table VI., p. 88. (Cf. p. 112 for the calculation of
mean alone, )
(1) (2) (3) (4) (5)
Deviation
Height. Frequency. from Product. Product
Inches. I Value 4. T= Fas
ez
57- 9 -10 20 200
58- 4 - 9 36 324
59- 14 - R 112 896
€- 41 - 287 2,009
ai. 83 - 498 2,988
Or 169 =~ 13 845 4,225
63— 394 - 1576 6,304
64- 669 - 2007 6,021
65— 990 = Bi 1980 3,960
66 1223 = i 1223 1.223
67- 1329 0 —- 8584 —
68 1230 3] : 1230 1,230
69- 1063 + 2 2126 4,252
70- 646 + 3 1938 5,814
71- 392 =f 1568 6,272
72- 202 + 1 1010 5,050
73- 79 + 474 2,844
7— a! + 7 : 224 1,568
yo— 1: : 128 1,024
7h- 45 405
» : 20 200
Tin) EJ +8763 56,809
From previous work, #/ — 4 =d= + ‘0209 class-intervals or inches,
=(/.8) _ 56809
= CER 66172.
o2=6"6172 — (0209)2
=66168.
.*.  @=2'57 class-intervals or inches.
ii. the standard deviation is 1-24 per cent. ; six times this is 7-44
per cent., and a range from 0-75 to 8:19 per cent. includes all
but one observation out of 632. In Example iii. the standard
deviation 1s 2:57 in., six times this is 15°42 in, and a range from,
say, 60 in. to 754 in. includes all but some 37 out of 8585
individuals, z.e. about 99-6 per cent. This rough rule serves to

a AEQE :
        <pb n="162" />
        142 THEORY OF STATISTICS.

give a more definite and concrete meaning to the standard
deviation, and also to check arithmetical work to some extent—
sufficiently, that is to say, to guard against very gross blunders.
It must not be expected to hold for short series of observations :
in Example i., for instance, the actual range is a good deal less
than six times the standard deviation.

11. The standard deviation is the measure of dispersion which
it is most easy to treat by algebraical methods, resembling in this
respect the arithmetic mean amongst measures of position. The
majority of illustrations of its treatment must be postponed to a
later stage (Chap. XI.), but the work of § 3 has already served as
one example, and we may take another by continuing the work of
§ 13 (0), Chap. VII. In that section it was shown that if a series
of observations of which the mean is M/ consist of two component
series, of which the means are J; and J/, respectively,

NM=N.M +N, M,

XN; and XN, being the numbers of observations in the two com-
ponent series, and N=, +, the number in the entire series.
Similarly, the standard deviation o of the whole series may be
expressed in terms of the standard deviations o; and o, of the
components and their respective means. Let

M-M=d,

M,- M=d,
Then the mean-square deviations of the component series about
the mean JM are, by equation (4), 32 +d;% and o,%+d,? respec:
tively. Therefore, for the whole series,

N.o%=N (2 +d2) + Noa 2 +45) o =D)

If the numbers of observations in the component series be equal
and the means be coincident, we have as a special case—

0? = (oy? + 07?) 6)
go that in this case the square of the standard deviation of the
whole series is the arithmetic mean of the squares of the standard
deviations of its components.

It is evident that the form of the relation (5) is quite general :
if a series of observations consists of » component series with
standard deviations oy, oy, . . . 0, and means diverging from the
general mean of the whole series by d,, dy, . . . d,, the standard
deviation o of the whole series is given (using m to denote any
subscript) by the equation—

N.o?=3(N,.0,% + 2 N,.d,2%) (7)

is
        <pb n="163" />
        VIIL.—MEASURES OF DISPERSION, ETC. 143
Again, as in § 13 of Chap. VIL, it is convenient to note, for the
checking of arithmetic, that if the same arbitrary origin be used
for the calculation of the standard deviations in a number of
component distributions we must have
(LE) =2NED +3) + LL +3(ED. (8)
12. As another useful illustration, let us find the standard
deviation of the first #7 natural numbers, The mean in this case
is evidently (¥+1)/2. Further, as is shown in any elementary
Algebra, the sum of the squares of the first VV natural numbers is
NHN +1)(2N +1)
La 2 4
The standard deviation o is therefore given by the equation—
C=F(NV+1)2N+1)-L(N +1)
that is, ol=7(N2-1) . (9)
This result is of service if the relative merit of, or the relative
intensity of some character in, the different individuals of a series
is recorded not by means of measurements, e.g. marks awarded on
some system of examination, but merely by means of their
respective positions when ranked in order as regards the character,
in the same way as boys are numbered in a class. With &amp;
individuals there are always N ranks, as they are termed,
whatever the character, and the standard deviation is therefore
always that given by equation (9).

Another useful result follows at once from equation (9), namely,
the standard deviation of a frequency-distribution in which all
values of X within a range +7/2 on either side of the mean are
equally frequent, values outside these limits not occurring, so that
the frequency-distribution may be represented by a rectangle. The
base / may be supposed divided into a very large number &amp; of equal
elements, and the standard deviation reduces to that of the first NV
natural numbers when &amp; is made indefinitely large. The single
unit then becomes negligible compared with J. and consequently

12
l= 12 : . (10)

13. Tt will be seen from the preceding paragraphs that the
standard deviation possesses the majority at least of the properties
which are desirable in a measure of dispersion as in an average
(Chap. VIL § 4). It is rigidly defined ; it is based on all the
observations made ; it is calculated with reasonable ease ; 1t lends
itself readily to algebraical treatment ; and we may add, though the
student will have to take the statement on trust for the present,
that it is, as a rule, the measure least affected by fluctuations of
        <pb n="164" />
        144 THEORY OF STATISTICS.

sampling. On the other hand, it may be said that its general
nature is not very readily comprehended, and that the process of
squaring deviations and then taking the square root of the mean
seems a little involved. The student will, however, soon surmount
this feeling after a little practice in the calculation and use of the
constant, and will realise, as he advances further, the advantages
that it possesses. Such root-mean-square quantities, it may be
added, frequently occur in other branches of science. The
standard deviation should always be used as the measure of disper-
sion, unless there is some very definite reason for preferring another
measure, just as the arithmetic mean should be used as the measure
of position. It may be added here that the student will meet with
the standard deviation under many different names, of which we
have adopted the most recent (due to Pearson, ref. 2): many of
the earlier names are hardly adapted to general use, as they bear
evidence of their derivation from the theory of errors of observation.
Thus the terms “mean error” (Gauss), “error of mean square”
(Airy), and “mean square error” have all been used in the same
sense. The standard deviation multiplied by the square root of
2 has been termed the “modulus” (Airy),—the student will see
later the reason for the adoption of the factor—-and the reciprocal
of the modulus the “precision” (Lexis). For the square of the
standard deviation, often required, R. A. Fisher has suggested
the term variance.”

14. The Mean Deviation.—The mean deviation of a series of
values of a variable is the arithmetic mean of their deviations
from some average, taken without regard to their sign. The
deviations may be measured either from the arithmetic mean or
from the median, but the latter is the natural origin to use. Just
as the root-mean-square deviation is least when deviations are
measured from the arithmetic mean, so the mean deviation is
least when deviations are measured from the median. For
suppose that, for some origin exceeded by m values out of &amp;, the
mean deviation has a value A. Let the origin be displaced by
an amount ¢ until it is just exceeded by m — 1 of the values only,
t.e. until it coincides with the mth value from the upper end of
the series. By this displacement of the origin the sum of devia-
tions in excess of the origin is reduced by m.c, while the sum of
deviations in defect of the mean is increased by (&amp; —m)c. The
new mean deviation is therefore

N —m)c — me
apliiopne
=A ! N-2
=A + 7 - 2m)e.
        <pb n="165" />
        VIIL.—MEASURES OF DISPERSION, ETC. 145
The new mean deviation is accordingly less than the old so long as
m&gt;31N.

That is to say, if NV be even, the mean deviation is constant for
all origins within the range between the &amp;/2th and the (4/2 + 1)th
observations, and this value is the least: if &amp; be odd, the mean
deviation is lowest when the origin coincides with the (&amp; + 1)/2th
observation. The mean deviation is therefore a minimum when
deviations are measured from the median or, if the latter be
indeterminate, from an origin within the range in which it lies.

15. The calculation of the mean deviation either from the mean
or from the median for a series of ungrouped observations is very
simple. Take the figures of Example i. (p. 137) as an illustration.
We have already found the mean (15s. 11d. to the nearest penny),
and the deviations from the mean are written down in column 3.
Adding up this column without respect to the sign of the devi-
ations we find a total of 590. The mean deviation from the mean
is therefore 590/38=15'53d. The mean deviation from the
median is calculated in precisely the same way, but the median
replaces the mean as the origin from which deviations are measured.
The median is 15s. 6d. The deviations in pence run 63, 57, 50,
36, and so on; their sum is 570; and, accordingly, the mean
deviation from the median is 15d. exactly.

16. In the case of a grouped frequency-distribution, the sum
of deviations should be calculated first from the centre of the
class-interval in which the mean (or median) lies, and then
reduced to the mean as origin. Thus in the case of Example ii.
the mean is 3:29 per cent. and lies in the class-interval centring
round 3-5 per cent. We have already found that the sum of
deviations in defect of 3-5 per cent. is 776, and of deviations in
excess 509: total (without regard to sign) 1285,—the unit of
measurement being, of course, as it is necessary to remember, the
class-interval. If the number of observations below the mean is
N, and above the mean N,, and M — 4 =d, as before, we have to
add X,.d to the sum found and subtract N,d. In the present
case N,=327 and N,=305, while d= — 0-42 class-intervals,
therefore

dN, -Ny)= -042x22=-92,
and the sum of deviations from the mean is 1285 — 9:2 = 12758.
Hence the mean deviation from the mean is 1275-8/632 =2019
class-intervals, or 1-01 per cent.

17. The mean deviation from the median should be found in
precisely similar fashion, but the mid-value of the interval in
which the median (instead of the mean) lies should, for con-

10

a bu
        <pb n="166" />
        14¢ THEORY OF STATISTICS.

venience, be taken as origin. Thus in Example ii. the median is
(Chap. VIL § 15) 3-195 per cent. Hence 3:0 per cent. should be
taken as the origin, d = + 0-39 intervals, NV; = 327, /,= 305. The
deviation-sum with 3'0 as origin is found to be 1263, and the
correction is +039 x 22= +86. Hence the mean deviation
from the median is 2:012 intervals, or again 1:01 per cent. The
value is really smaller than that of the mean deviation from the
arithmetic mean, but the difference is too slight to affect the
second place of decimals.

It should be noted that, as in the case of the standard deviation,
this method of calculation implies the assumption that all the
values of X within any one class-interval may be treated as if
they were the mid-value of that interval. This is, of course, an
approximation; but as a rule gives results of amply sufficient
accuracy for practice if the class-interval be kept reasonably small
(¢f. again Chap. VI. § 5). We have left it as an exercise to the
student to find the correction to be applied if the values in each
interval are treated as if they were evenly distributed over the
interval, instead of concentrated at its centre (Question 7).

18. The mean deviation, it will be seen, can be calculated rather
more rapidly than the standard deviation, though in the case of a
grouped distribution the difference in ease of calculation is not
great. It is not, on the other hand, a convenient magnitude for
algebraical treatment ; for example, the mean deviation of a dis-
tribution obtained by combining several others cannot in general
be expressed in terms of the mean deviations of the component
distributions, but depends upon their forms. As a rule, it is more
affected by fluctuations of sampling than is the standard deviation,
but may be less affected if large and erratic deviations lying
somewhat beyond the bulk of the distribution are liable to occur.
This may happen, for example, in some forms of experimental
work, and in such cases the use of the mean deviation may be
slightly preferable to that of the standard deviation.

19. It is a useful empirical rule for the student to remember
that for symmetrical or only moderately asymmetrical distri-
butions, approaching the ideal forms of figs. 5 and 9, the mean
deviation is usually very nearly four-fifths of the standard devia
tion. Thus for the distribution of pauperism we have

mean deviation 1-01 0-81

standard deviation 1:24 ~~ °°
In the case of the distribution of male statures in the British
Isles, Example iii., the ratio found is 0:80. For a short series of
observations like the wage statistics of Example i. a regular result
could hardly be expected: the actual ratio is 15°0/20'5=0-73.

0
        <pb n="167" />
        VII.—MEASURES OF DISPERSION, ETC. 147
We pointed out in § 10 that in distributions of the simple forms
referred to, a range of six times the standard deviation contains
over 99 per cent. of all the observations. If the mean deviation
be employed as the measure of dispersion, we must substitute a
range of 71 times this measure.

20. The Quartile Deviation or Semi-interquartile Range.—1If a
value @, of the variable be determined of such magnitude that
one-quarter of all the values observed are less than ¢, and three-
quarters greater, then , is termed the lower quartile. Similarly,
if a value ; be determined such that three-quarters of all the
values observed are less than @, and one-quarter only greater,
then (J), is termed the upper quartile. The two quartiles and the
median divide the observed values of the variable into four
classes of equal frequency. If M7 be the value of the median, in
a symmetrical distribution

Me —- Q,=Q, - I,
and the difference may be taken as a measure of dispersion. But
as no distribution is rigidly symmetrical, it is usual to take as the
measure
0-924,

and @ is termed the quartile deviation, or better, the semi-
interquartile range—it is not a measure of the deviation from
any particular average: the old name probable error should be
confined to the theory of sampling (Chap. XV. § 17).

21. In the case of a short series of ungrouped observations
the quartiles are determined, like the median, by inspection.
In the wage statistics of Example i., for instance, there are
38 observations, and 38/4=9'5: What is the lower quartile ?
The student may be tempted to take it halfway between the
ninth and tenth observations from the bottom of the list;
but this would be wrong, for then there would be nine
observations only below the value chosen instead of 95. The
quartile must be taken as given by the tenth observation
itself, which may be regarded as divided by the quartile, and
falling half above it and half below. Therefore

Lower quartile @Q, = 14s. 10d.
Upper quartile Q,= 16s. 11d.
Q;-¢
and Q= Lgl = 12-54,

22. In the case of a grouped distribution, the quartiles, like

the median, are determined by simple arithmetical or by

-
        <pb n="168" />
        142 THEORY OF STATISTICS.
graphical interpolation (¢f. Chap. VII. §§15, 16). Thus for the
distribution of pauperism, Example ii., we have
632+-4=158
Total frequency under 2:25 per cent. =138
Difference = 20
Frequency in interval 2:25 — 2-75 = 89
Whence @, =2-25 + &gt; x 0-5 = 2-362 per cent.
Similarly we find @, =4-130 0
Hence O= hse = (0-884 i
It is left to the student to’ check the value by graphical
interpolation.

23. For distributions approaching the ideal forms of figs.
5 and 9, the semi-interquartile range is usually about two-thirds
of the standard deviation. Thus for Example ii. we find

Q@ 0884

YT =071.
The distribution of statures, Example iii., gives the ratio 0°68.
The short series of wage statistics in Example i. could not be
expected to give a result in very strict conformity with the
rule, but the actual ratio, viz. 0°61, does not diverge greatly.
It follows from this ratio that a range of nine times the semi:
interquartile range, approximately, is required to cover the same
proportion of the total frequency (99 per cent. or more) as a range
of six times the standard deviation.

24. Of the three measures of dispersion, the semi-interquartile
range has the most clear and simple meaning. It is calculated,
like the median, with great ease, and the quartiles may be found,
if necessary, by measuring two individuals only. If, e.g., the
dispersion as well as the average stature of a group of men
is required to be determined with the least possible expenditure
of time, they may be simply ranked in order of height, and the
three men picked out for measurement who stand in the centre
and one-quarter from either end of the rank. This measure of
dispersion may also be useful as a makeshift if the calculation
of the standard deviation has been rendered difficult or impossible
owing to the employment of an irregular classification of the
frequency or of an indefinite terminal class. Such uses are,
however, a little exceptional, and, generally speaking, the

N
        <pb n="169" />
        VIIL.—MEASURES OF DISPERSION, ETC. 149
semi-interquartile range as a measure of dispersion is not to be
recommended, unless simplicity of meaning is of primary im-
portance, owing to the lack of algebraical convenience which
it shares with the median. Further, it is obvious that the
quartile, like the median, may become indeterminate, and that
the use of this measure of dispersion is undesirable in cases of
discontinuous variation: the student should refer again to the
discussion of the similar disadvantage in the case of the median,
Chap. VII. § 14. It has, however, been largely used in the past,
particularly for anthropometric work.

25. Measures of Relative Dispersion.—As was pointed out in
Chapter VIL § 26, if relative size is regarded as influencing not only
the average, but also deviations from the average, the geometric
mean seems the natural form of average to use, and deviations
should be measured by their ratios to the geometric mean. As
already stated, however, this method of measuring deviations, with
its accompanying employment of the geometric mean, has never
come into general use. It is a much more simple matter to allow
for the influence of size by taking the ratio of the measure of
absolute dispersion (e.g. standard deviation, mean deviation, or
quartile deviation) to the average (mean or median) from which
the deviations were measured. Pearson has termed the quantity

a

ve. the percentage ratio of the standard deviation to the arithmetic
mean, the coefficient of variation (ref. 7), and has used it, for
example, in comparing the relative variations of corresponding
organs or characters in the two sexes: the ratio of the quartile
deviation to the median has also been suggested (Verschaeffelt,
ref. 8). Such a measure of relative dispersion is evidently a mere
number, and its magnitude is independent of the units of
measurement emrloyed.

26. Measures of Asymmetry or Skewness.—If we have to compare
a series of distributions of varying degrees of asymmetry, or skew-
ness, as Pearson has termed it, some numerical measure of this
character is desirable. Such a measure of skewness should
obviously be independent of the units in which we measure the
variable—e.g. the skewness of the distribution of the weights of a
given set of men should not be dependent on our choice of the
pound, the stone, or the kilogramme as the unit of weight—and
the measure should accordingly be a mere number. Thus the
difference between the deviations of the two quartiles on either
side of the median indicates the existence of skewness, but to
measure the degree of skewness we should take the ratio of this
        <pb n="170" />
        THEORY OF STATISTICS.

difference to some quantity of the same dimensions, e.g. the semi-
interquartile range. Our measure would then be, taking the
skewness to be positive if the longer tail of the distribution runs
in the direction of high values of JX,

skownoss = (Gam 20 = (hi ~@)_Q+Q,-208

Q Q
This would not be a bad measure if we were using the quartile
deviation as a measure of dispersion : its lowest value is zero,
when the distribution is symmetrical ; and while its highest possible
value is 2, it would rarely in practice attain higher numerical
values than +1. A similar measure might be based on the mean
deviations in excess and in defect of the mean. There is, however,
only one generally recognised measure of skewness, and that is
Pearson’s measure (ref. 9)—
mean — mode
Hick standard deviation 4 (17)

This is evidently zero for a symmetrical distribution, in which
mode and mean coincide. No upper limit to the ratio is apparent
from the formula, but, as a fact, the value does not exceed unity for
frequency-distributions resembling generally the ideal distributions
of fig. 9. As the mode is a difficult form of average to determine
by elementary methods, it may be noted that the numerator of the
above fraction may, in the case of frequency-distributions of the
forms referred to, be replaced approximately by 3(mean — median),
(¢f. Chap. VIL §20). The measure (12) is much more sensitive
than (11) for moderate degrees of asymmetry.

27. The Method of Percentiles.—We may conclude this chapter
by describing briefly a method that has been largely used in the
past in lieu of the methods dealt with in Chapters VI. and VII,
and the preceding paragraphs of this chapter, for summarising
such statistics as we have been considering. If the values of the
variable (variates, as they are sometimes termed) be ranged in
order of magnitude, and a value P of the variable be determined
such that a percentage p of the total frequency lies below it and
100 - p above, then P is termed a percentile. If a series of per-
centiles be determined for short intervals, e.g. 5 per cent. or 10
per cent., they suffice by themselves to show the general form
of the distribution. This is Sir Francis Galton’s method of
percentiles. The deciles, or values of the variable which divide
the total frequency into ten equal parts, form a natural and
convenient series of percentiles to use. The fifth decile, or value
of the variable which has 50 per cent. of the observed values

150
        <pb n="171" />
        VIII.—MEASURES OF DISPERSION, ETC. 151
above it and 50 per cent. below, is the median: the two quartiles
lie between the second and third and the seventh and eighth
deciles respectively.

28. The deciles, like the median and quartiles, may be
determined either by arithmetical or by graphical interpolation,
excluding the cases in which, like the former constants, they
become indeterminate (¢f. § 24). It is hardly necessary to give
an illustration of the former process, as the method is precisely
the same as for median and quartiles (Chap. VII. § 15, and above,
§ 22). Fig. 26 shows, of course on a very much reduced scale, the

;
SE
=
\
14
a)
3
Percentage of the population
in receipt of relief
FiG. 26.—Curve showing the number of Districts of England and Wales in
which the Pauperism on 1st January 1891 did not exceed any given per-
centage of the population (same data as Fig. 10, p. 92): graphical
determination of Deciles.
curve used for obtaining the deciles by the graphical method in
the case of the distribution of pauperism (Example ii. above).
The figures of the original table are added up step by step from
the top, so as to give the total frequency not exceeding the upper
limit of each class-interval, and ordinates are then erected to a
horizontal base to represent on some scale these integrated
frequencies: a smooth curve is then drawn through the tops of
the ordinates so obtained. This curve, as will be seen from the
figure, rises slowly at first when the frequencies are small, then
more rapidly as they increase, and finally turns over again and
becomes quite flat as the frequencies tail off to zero. The deciles
        <pb n="172" />
        1: THEORY OF STATISTICS.

may be readily obtained from such a curve by dividing the
terminal ordinate into ten equal parts, and projecting the points
So obtained horizontally across to the curve and then vertically
down to the base. The construction is indicated on the figure for
the fourth decile, the value of which is approximately 2-88 per cent.

29. The curve of fig. 26 may be drawn in a different way by
taking a horizontal base divided into ten or a hundred equal
parts (grades, as Sir Francis Galton has termed them), and erecting
at each point so obtained a vertical proportional to the cor-
responding percentile. This gives the curve of fig. 27, which was
obtained by merely redrafting fig. 26. The curve is of so-called

fo Jo 20 30 40 S50 60 70 80 90 00
_S: Shp-S0 ad
5
: :
2
2
¥ 2
I i
3
Ry C= -0
v v i p 1 BN I i 1 I
0 10 20 30 40 50 60 70 80 90 100
Grades
Fic. 27.—The curve of Fig. 26 redrawn so as to give the Pauperism
corresponding to each grade: Galton’s ‘‘ Ogive.”
ogive form. The ogive curve for the distribution of statures
(Example iii.) is shown for comparison in fig. 28. It will be noticed
that the ogive curve does not bring out the asymmetry of the
distribution of pauperism nearly so clearly as the frequency-
polygon, fig. 10, p. 92.

30. The method of percentiles has some advantages as a method
of representation, as the meaning of the various percentiles is so
simple and readily understood. An extension of the method to
the treatment of non-measurable characters has also become of
some importance. For example, the capacity of the different boys
in a class as regards some school subject cannot be directly
measured, but it may not be very difficult for the master to

292
        <pb n="173" />
        VIIL—MEASURES OF DISPERSION, ETC. 153
arrange them in order of merit as regards this character: if the
boys are then “numbered up” in order, the number of each boy,
or his rank, serves as some sort of index to his capacity (cf. the
remarks in § 12. It should be noted that rank in this sense is
not quite the same as grade; if a boy is tenth, say, from the
bottom in a class of a hundred his grade is 9:5, but the method
is in principle the same with that of grades or percentiles).
The method of ranks, grades, or percentiles in such a case may
be a very serviceable auxiliary, though, of course, it is better if
possible to obtain a numerical measure. But if, in the case of a
measurable character, the percentiles are used not merely as

20 29 40 50 £0 Ld 2
4

3. ;

v

3649 — 2

~
“ep .. 2
60- — 60
v vo &lt;9 30 40 50 60 70 80 90 100

Stature corres onding to 7

Por adele males in the Britian reer.

F16. 28. —Ogive Curve for Stature, same data as Fig. 6, p. 89.
constants illustrative of certain aspects of the frequency-distribu-
tion, but entirely to replace the table giving the frequency-
distribution, serious inconvenience may be caused, as the
application of other methods to the data is barred. Given the
table showing the frequency-distribution, the reader can calculate
not only the percentiles, but any form of average or measure of
dispersion that has yet been proposed, to a sufficiently high
degree of approximation. But given only the percentiles, or at
least so few of them as the nine deciles, he cannot pass back to
the frequency-distribution, and thence to other constants, with any
degree of accuracy. In all cases of published work, therefore,
the figures of the frequency-distribution should be given ; they
are absolutely fundamental.
        <pb n="174" />
        1F4 THEORY OF STATISTICS.
REFERENCES.
General.

(1) FECHNER, G. T., ‘ Ueber den Ausgangswerth der kleinsten Abweichungs-
summe, dessen Bestimmung, Verwendung und Verallgemeinerung,”
Abh. d. kgl. sichs. Ges. d. Wissenschaften, vol. xviii. (also numbered
vol. xi. of the 4bk. d. math.-phys. Classe) ; Leipzig, 1878, p. 1.

Standard Deviation.

(2) PEARSON, KARL, ‘Contributions to the Mathematical Theory of Evolution
(i. On the Dissection of Asymmetrical Frequency-curves),” Phil. Trans.
Roy. Soc., Series A, vol. clxxxv., 1894, p. 71. (Introduction of the
term ¢¢ standard deviation,” p. 80.)

Mean Deviation.

(8) LAPLACE, PIERRE SimoN, Marquis de, Théorie analytique des probabili-
tds: 2m supplément, 1818. (Proof that the mean deviation is a
minimum when taken about the median.)

(4) TRACHTENBERG, M. I., ‘“ A Note on a Property of the Median,” Jour.
Roy. Stat. Soc., vol. 1xxviii., 1915, p. 454. (A very simple proof of
the same property.)

Method of Percentiles, including Quartiles, etc.

(5) GALTON, FRANCIS, “‘ Statistics by Intercomparison, with Remarks on the
Law of Frequency of Error,” Phil. Mag., vol. xlix. (4th Series), 1875,
pp. 83-46.

(6) GALTON, FRANCIS, Natural Inheritance ; Macmillan, 1889. (The method
of percentiles is used throughout, with the quartile deviation as the
measure of dispersion.)

Relative Dispersion.

(7) PEARSON, Karr, “ Regression, Heredity, and Panmixia,” Phil. Trans.
Roy. Soc., Series A, vol. clxxxvii., 1896, p. 253. (Introduction of
¢¢ coefficient of variation,” pp. 276-7.)

(8) VERSCHAEFFELT, E., “Ueber graduelle Variabilitit von pflanzlichen
Eigenschaften,” Ber, deutsch. bot. Ges., Bd. xii., 1894, pp. 350-55.

Skewness.

(9) Pearson, KARL, ‘‘ Skew Variation in Homogeneous Material,” Phil.
Trans. Roy. Soc., Series A, vol. elxxxvi., 1895, p. 343. (Introduction
of term, p. 370.)

Calculation of Mean, Standard-deviation, or of the General

: Moments of a Grouped Distribution.

We have given a direct method that seems the simplest and best for
the elementary student. A process of successive summation that has
some advantages can, however, be used instead. The student will
find a convenient description with illustrations in—

(10) ELpERTON, W. PALIN, Frequency-curves and Correlation ; C. &amp; E.
Layton, London, 1906.

CU a
        <pb n="175" />
        VIII.—MEASURES OF DISPERSION, ETC.
EXERCISES.

1. Verify the following from the data of Table VI., Chap. VI, continuing

ithe work from the stage reached for Qu. 1, Chap. VIL
Stature in Inches for Adult Males born in—
England. | Scotland. Wales. | Ireland.

Standard deviation . : 256 250 2-35 2-17

Mean deviation. 3 A 2-05 1-95 1-82 1-69

Quartile deviation . 1780 1560 BB 1146 1:35

Mean deviation / standard 0°80 0-78 0°78 0-78

deviation

Quartile deviation/standard 0°69 062 0-62 062

deviation

Lower quartile . : g 65°55 6692 65-06 66°39

Upper ,, i 4 69°10 70°04 67°98 69°10

2. (Continuing from Qu. 2, Chap. VIL.) Find the standard deviation,
mean deviation, quartiles and quartile deviation (or semi-interquartile range)
for the distribution of weights of adult males in the United Kingdom given in
the last column of Table IX., Chap. VI.

Compare the ratios of the mean and quartile deviations to the standard
deviation with the ratios stated in §§ 19 and 23 to be usual.

Find the value of the skewness (equation 12), using the approximate value
of the mode.

3. Using, or extending if necessary, your diagram for Question 4, Chap. VII.
find the quartile values for houses assessed to inhabited house duty in 1885-6,
from the data of Table IV., Chap. VI.

Find also the 9th decile (the value exceeded by 10 per cent. of the houses
only).

4. Verify equation (9) by direct calculation of the standard deviation of the
numbers 1 to 10.

5. (Data from Sauerbeck, Jour. Roy. Stat. Soc., March 1909.) The
following are the index-numbers (percentages) of prices of 45 commodities in
1908 on their average prices in the years 1867-77 :—40, 43, 43, 46, 46, 46,
54, 56, 59, 62, 64, 64, 66, 66, 67, 67, 68, 68, 69, 69, 69, 71, 75, 75, 76, 76,
78, 80, 82, 82, 82, 82, 82, 83, 84, 86, 88, 90, 90, 91, 91, 92, 95, 102, 127.
Find the mean and standard deviation (1) without further grouping ; (2)
grouping the numbers by fives (40-, 45-, 50-, ete.); (3) grouping by tens (40-,
50-, 60, etec.).

6. (Continuing from Qu. 8, Chap. VIL) Supposing the frequencies of
values 0, 1, 2, 3, . . . of a variable to be given by the terms of the binomial
series

qn, A Pu Link) J PY sininie
1-2
where p+¢=1, find the standard deviation.

7. (Cf. the remarks at the end of § 17.) The sum of the deviations (with-

out regard to sign) about the centre of the class-interval containing the mean

155
        <pb n="176" />
        ! THEORY OF STATISTICS.

(or median), in a grouped frequency-distribution, is found to be S. Find the
correction to be applied to this sum, in order to reduce it to the mean (or
median) as origin, on the assumption that the observations are evenly dis-
tributed over each class-interval. Take the number of observations below the
interval containing the mean (or median) to be n;, in that interval =, and
above it my; and the distance of the mean (or median) from the arbitrary
origin to be d.

Show that the values of the mean deviation (from the mean and from the
median respectively) for Example ii., found by the use of this formula, do not
differ from the values found by the simpler method of §§ 16 and 17 in the
second place of decimals.

8. (W. Scheibner, “Ueber Mittelwerthe,” Berichte der kgl. sdchsischen
Gesellschaft d. Wissenschaften, 1873, p. 564, cited by Fechner, ref. 2 of
Chap. VIL : the second form of the relation is given by G. Duncker (Die
Methode der Variationsstatistik ; Leipzig, 1899) as an empirical one.) Show
that if deviations are small compared with the mean, so that (2/2/)® may be
neglected in comparison with z/J, we have approximately the relation

a?
e=1(1-1] 7):
where @ is the geometric mean, J the arithmetic mean, and ¢ the standard
deviation : and consequently to the same degree of approximation M2 - G2=42

9. (Scheibner, loc. cit., Qu. 8.) Similarly, show that if deviations are small
compared with the mean, we have approximately

. 2
eri)
H being the harmonic mean.

156
        <pb n="177" />
        CHAPTER IX.
CORRELATION.

1-3. The correlation table and its formation—4-5. The correlation surface—
A-7. The general problem—8-9. The line of means of rows and the
line of means of columns: their relative positions in the case of
independence and of varying degrees of correlation—10-14. The
Hi SE coefficient, the regressions, and the standard-deviations of
arrays—15-16. Numerical calculations—17. Certain points to be
remembered in calculating and using the coefficient.

1. IN chapters VL.-VIII. we considered the frequency-distribu-

tion of a single variable, and the more important constants

that may be calculated to describe certain characters of such
distributions. We have now to proceed to the case of two
variables, and the consideration of the relations between them.

2. If the corresponding values of two variables be noted
together, the methods of classification employed in the preceding
chapters may be applied to both, and a table of double entry or
contingency-table (Chap. V.) be formed, exhibiting the frequencies
of pairs of values lying within given class-intervals. Six such
tables are given below as illustrations for the following
variables : —Table I., two measurements on a shell (Pecten).
Table IL, ages of husbands and wives in England and Wales in
1901. Table IIL, statures of fathers and their sons (British).
Table IV., fertility of mothers and their daughters (British
peerage). Table V., the rate of discount and the ratio of reserves
to deposits in American banks. Table VIL, the proportion of
male to total births, and the total numbers of births, in the
registration districts of England and Wales.

Each row in such a table gives the frequency-distribution of
the first variable for cases in which the second variable lies
within the limits stated on the left of the row. Similarly, every
column gives the frequency-distribution of the second variable
for cases in which the value of the first variable lies within the
limits stated at the head of the column. As “columns” and
“rows” are distinguished only by the accidental circumstance
15%
        <pb n="178" />
        158

THEORY OF STATISTICS.

7 on HRODHDRINO HO NN | =
. 2 = risihalsio icon ' 3
iE ps
=
ng 76-78. REET TEE =a
$8 2 _ =e
Lz
58 {o-70. 00K | Lobe LU ||
~ 82 _ ar
=
2 pron I JE
S= aT oo]
Sa Eh (EEERRERRE | 2
.
gg 2 a
p=
i= Rosa MI La 12
Ie Js - ) Ee
id -

I» lees IRRRREE EARNER

? bs
er

X05 ese. [| IIIeRAIIIN|E

0 = De
eg © | 55-57. Pl feed ee
a ’ =

&lt;5 A -

. Eso IE eas LS

. «4 1

3 2 wee Ee
2 &gt; _
= — nN
IZ ses LIGRIILILLYF
8a _—
39 zoe ENR ame fe | lS
=O im
~5
=" awry (EECHETURRERD £
$2 his
&gt; 6b = =
fe: | 40-42. py LU
8 = fo. _ Cm me
“od
in ah CERNE E RENEE
a _ oy
S&lt;
is 2953EREE822R 3
= |
es SIRCSARE RISE 2
=O
Bo
El (2) Dorso-ventral diameter, mm.
        <pb n="179" />
        TasLe 11. — Correlation between (1) the Age of Wife, (2) the Age of Husband, for all Husbands and W: ives in England and
Wales who were residing together on the night of the Census, 1901. (Census, 1901, Summary Tables, p. 182.) Table
based on 5,317,520 pairs ; condensed by omitting 000s.

(1) Ages of Wives.
__ | Total.
135-1 e9- | 25- | 30- 80-1 43-1" 2i- | 80- | 55- | 60- @3- 1... | 80- | 85-
dl PRT We Ae, ERR, a 3 or eo, semen) err
15- | OF Me ATE. EE -— ol lm - — 4
20~ | | 173 : | - : 240

~ 25- ie ‘ i 688

2 30- SE 817

&gt; 35- (Ee 798

3 40- : : 700

15. 2, 595

&gt; 50- 1 i 483

3 55 369

gh : 271

i 6b- b 175

= 70- 104
7H- 0)

8 2 5
8.- {
Total | £3 | 414 [808 1864 [761 1669 [550 «a; [8:7 [236 |1us | 5817
mn AL eee eve tcp ee es
pt
Or
~N
        <pb n="180" />
        pt
3
TABLE 1II.— Correlation between (1) Stature of Father and (2) Stature of Son: 1 or 2 Sons only of each Father.
Measurements in inches. [From Karl Pearson and Alice Lee, Biometrika, vol. ii. (1903), p. 415.]
(1) Stature of Father.
zslgle|ele|s|e || s| 88 |3|2|3|3|F mm
Sal a a SSN ENTE | TER IC IRR 2 Ei
= oo a = o = &gt; ; %; = = . = ~~ =~ = =
DI =o (oil et Sh Wh Lge | = So0® ° = | yf | Ee
FZ ll ten Ls fk Cs ogsk Sh ot ot or gelll No [I gu CS or &amp;
| =
|
69'5-60'56 ~ ZN — SE Es 51 — — = a —SER- JE 2 ;
605-6156 ~~ - — — | 5 — en —_ 1 _— as —_— yl = - —_— - 15 | 5
615-625 anil -o5 8 — EI -5 SET 25 25 ‘5 ‘5 — I FOU - = — 3'5
625-636 “5 25 2:26 226 | 2 4 5 2:75 1-259 — all 25 +25. — —- - — 205 cS
635-645 p15. 3:75 Is 495) 8 9:25 | 8 1:25 815 | Cc 125 | - = = 38'5 =f
645-65 b 5 12 325 | 95 (135 | 1075 75 55 35 | 25 ! —} — ~~ 615 7"
65°5-66°5 995 | 5-95 | 9:5 (10 1676 175 | 16 525 2 2-5 [W1 — 89'5 ‘
66°5-67°5 ‘75 36 [13°75|19'75| 266 25'76H 19-5 12:5 | 1376 | 3-25 b 1 148 :
675-686 75 |10 |10-25| 24:25 315 | 235 | 295 | 13:25 , 85 | 95 2:25 - 173'5 ;
68-5-69°5 5:25 1 b 12°75 | 1825 16 24 29 | 215010 36 2:85, - 1496 ¢
69°5-70°5 2:5 | 575| 1875 1176 | 106 | 225 | 195 145 625 | 35 I 1s 128 :
70°5-71'5 325! b 875 10775 ' 19 14-75 | 20:75 10°75] 8 6 [ 108 ¢.
71°5-72°5 25 | 2 1:25 07 776 | 10°75 11°25 , 10 85 275 63 Qa
72'5-73'5 — 756 26 76 6:5 Wile 75 | 625] 3:25 ° 42
735-745 x L'5 - 5:26 2:25 65 325| 3:9 2
74°5-75'6 x - - 1 2 AS -T5 175 | = 86
75°5-765 1:95 25 J: 1} I 4
765-775 1:25 +26 15 4
775-785 - he 75 3
78:6-79°5 : 25+ 2b b
_ a a
Total ' 3 3% &amp; ir 1335 615 |o55 | 142 | 1375 154 lio 16 78 lao i285 ¢ | 55 |1008
        <pb n="181" />
        TABLE IV.—Correlation between the Number of Children (1) of a Woman, (2) of one of her Daughters. One Daughter only
taken from each Mother. Marriages lasted at least 15 years in each case. British Peerage Statistics. [From Karl
Pearson, Alice Lee, and L. Bramley Moore, Phil. Trans., A, vol. excii. (1899), table iv.]

(1) Number of Mother’s Children.
‘Total.

- LE. J AEE Ryn, | 11. | 12, [4B.0 16.

or 1S. 91 2 110 ,

. 5 10 | 98

= 15! WF 1st Ay 97

. . y a le 105 :

: ¢ 1. 21 1 133 :

' 23 123
. = 1u3
. 03

= ,

5

fr 4

: d

7 )

o X

0

CZ ITotal | [100 1122 140 | 124 | 118 | 62 . 76 | 62 | 25 | 22 | . 1000
_
3
        <pb n="182" />
        TABLE V.— Correlation between (1) Call Discount Rates and (2) Percentage of Reserves on Deposits in New York Associated Banks
( Weekly Returns). (From Statistical Studies in the New York Money Market, by J. P. Norton. Publications of the _,
Department of the Social Sciences, Yale University ; The Macmillan Co., 1902.) Note that, after the column headed
8 per cent., blank columns have been omitted to save space. Bo
(1) Call Discount Rates.
ells] 2 25 3 73 4 435] 5 |55) 6 65] 7 [75] 8 | 9 (Yo | 12) Tol | 7 Tet
Ef. ai eo mmm eee eee me —_— me | em —— = o | — ee | Em
21 | —R=Ia | rll re te : | = ll | | ml 2
Shoe Si - | — J
23 - —- en : ‘ - | Es T
J oo : = = - 1 9 ~
: a 4 . 42 :
9 —_ —_ ! 1 1 | 2 85 -
9 - [lu ho 124 %
0 — 5 3. 2 11 . 115 .
2 SUE 9 [FFs ET I, ~ . 109 e
zo 1012 (120 x = 55 .
; &gt; 8 10 6 ‘ I = -  — -— RN J z
31580 ve [PTO - = i = TE :
330 150 8 .- LSE - — zs :
34 SEE _o ; * :
35 - a
36 ~ 1a =
. 37 of 2)
38 — 11
39 = 2
= 40 2 - y — [I
“ 4 7 : - - | | = 1
4 0 — 10
4 : 2
4 a =
121 | 93 | 125 | 70 | 69 | 40 | 62 | 45 | 52 | 20 | 85 | 10 | 18 So 43 707d 3 LEE Rs,
        <pb n="183" />
        TaBLe VI.—Showing the Number of Registration Districts in England and Wales exhibiting (1) a given Proportion of
Male Births, (2) a given Total Number of Births during the Decade 1881-90. (The Data as to Total Births and
Numbers of Male and Female Births from Decennial Supplement to Report of the Registrar-General. Table from H. D.
Vigor and G. U. Yule, Jour. Roy. Stat. Soc., vol. Ixix., 1906.)

(1) Proportion of Male Births per 1000 of all Births.
ele om le] oa £2] mal alm slglgl gle 2 | 2a] toil a a joule] [or on | 2. Total,
” Ck 2 Sh THEM TH DAT TEER TO I Wer
Be ele a Ae 0 EE RE eS | ek] E05
4 A eG EL BE OF Re UN BR gage
0- 4 1 ° 9 erie) 12) 14d 128 o Simpl aah idiot] 4 14 1 Hue
4- 8 ~ ~ Ed 201 42/3611 1 + ~S--Ir1 B  _% — ong

ol 8-120 : 15 17 18|16 7 + ~a. 0 sa

«12-16 [ 100 Slo. SI -(R- — — — BL 48

© 16-20 Bb 6 A - SN. — Tok

20-24 — 1 , . dN gm

5 24-28 v2 NE
28- 32 rs =. 1-
32- 36 s COR 2
36- 40 ~
40- 44 LR
44- 48 — 11
48- 52 —-  n
52- 56 -

56- 60 ~
60- 64
64- 68
68~ 72
72-76
76- 80
80- 84
84- 88
88- 92
92- 96
96-100
. 100- 04
| 104- 08
148- 52
Tot! 41.2 2]4|6][15[18]46 NEES GENEUE al 632 =
a 9
        <pb n="184" />
        164 THEORY OF STATISTICS.

of the one set running vertically and the other horizontally, and
the difference has no statistical significance, the word array
has been suggested as a convenient term to denote either a row
or a column. If the values of X in one array are associated
with values of ¥ between the limits ¥,, —8 and Y, +38, ¥, may be
termed the type of the array. (Pearson, ref. 6.) The special
kind of contingency tables with which we are now concerned
are called correlation tables, to distinguish them from tables
based on unmeasured qualities and so forth.

3. Nothing need be added to what was said in Chapter VI. as
regards the choice of magnitude and position of class-intervals.
When these have been fixed, the table is readily compiled by
taking a large sheet ruled with rows and columns properly
headed in the same way as the final table and entering a dot,
stroke, or small cross in the corresponding compartment for each
pair of recorded observations. If facility of checking be of
great importance, each pair of recorded values may be entered
on a separate card and these dealt into little packs on a board
ruled in squares, or into a divided tray; each pack can then be
run through to see that no card has been mis-sorted. The
difficulty as to the intermediate observations—values of the
variables corresponding to divisions between class-intervals—will
be met in the same way as before if the value of one variable
alone be intermediate, the unit of frequency being divided
between two adjacent compartments. If both values of the pair
be intermediates, the observation must be divided between four
adjacent compartments, and thus quarters as well as halves may
occur in the table, as, e.g., in Table III. In this case the statures
of fathers and sons were measured to the nearest quarter-
inch and subsequently grouped by l-inch intervals: a pair in
which the recorded stature of the father is 60'5 in. and that of
the son 62-5 in. is accordingly entered as 0°25 to each of the
four compartments under the columns 59:5-605, 60:5-61'5, and
the rows 61'5-62-5, 62:5-63'5. Workers will generally form
their own methods for entering such fractional frequencies
during the process of compiling, but one convenient method is
to use a small x to denote a unit and a dot for a quarter; the
four dots should be placed in the position of the four points
of the x and joined when complete. It is best to choose the
limits of class-intervals, where possible, in such a way as to avoid
fractional frequencies.

4. The distribution of frequency for two variables may be
represented by a surface or solid in the same way as the frequency-
distribution of a single variable may be represented by a plane
figure. We may imagine the surface to be obtained by erecting
        <pb n="185" />
        IX.—CORRELATION. 165
at the centre of every compartment of the correlation-table a
vertical of length proportionate to the frequency in that com-
partment, and joining up the tops of the verticals. If the
compartments were made smaller and smaller while the class-
frequencies remained finite, the irregular figure so obtained would
approximate more and more closely towards a continuous curved
surface—a frequency-surface — corresponding to the frequency-
curves for single variables of Chapter VI. The volume of the
frequency-solid over any area drawn on its base gives the
frequency of pairs of values falling within that area, just as the
area of the frequency-curve over any interval of the base-line gives
the frequency of observations within that interval. Models of
actual distributions may be constructed by drawing the frequency-
distributions for all arrays of the one variable, to the same scale,
on sheets of cardboard, and erecting the cards vertically on a
base-board at equal distances apart, or by marking out a base-
board in squares corresponding to the compartments of the
correlation-table, and erecting on each square a rod of wood of
height proportionate to the frequency. Such solid representations
of frequency-distributions for two variables are sometimes termed
stereograms.
5. It is impossible, however, to group the majority of
frequency-surfaces, in the same way as the frequency-curves,
under a few simple types: the forms are too varied. The simplest
ideal type is one in which every section of the surface is a sym-
metrical curve—the first type of Chap. VL (fig. 5, p. 89). Like
the symmetrical distribution for the single variable, this is a very
rare form of distribution in economic statistics, but approximate
illustrations may be drawn from anthropometry. Fig. 29 shows
the ideal form of the surface, somewhat truncated, and fig.
30 the distribution of Table III., which approximates to the same
type,—the difference in steepness is, of course, merely a matter of
scale. The maximum frequency occurs in the centre of the
whole distribution, and the surface is symmetrical round the
vertical through the maximum, equal frequencies occurring at
equal distances from the mode on opposite sides. The next
simplest type of surface corresponds to the second type of
frequency-curve—the moderately asymmetrical. Most, if not all,
of the distributions of arrays are asymmetrical, and like the dis-
tribution of fig. 9, p. 92: the surface is consequently asymmetrical,
and the maximum does not lie in the centre of the distribution.
This form is fairly common, and illustrations might be drawn
from a variety of Sources—economics, meteorology, anthropometry,
ete. The data of Table IL. will serve as an example. The total
distributions and the distributions of the majority of the arrays

i:
        <pb n="186" />
        Pt
[=p]
(=21
=
11
r
=)
xt
wn
ms

FIc. 26.—The ideal symmetrical (¢f normal ”) Frequency-Surface, with the extremes truncated.
        <pb n="187" />
        Theory of Statistics. ] [7 face page 166
p=
. ot Spa
oor
Fath?
Fig. 30.—Frequency Surface for Stature of Father and Stature of Son (data of Table IIL.) : the surface is approximately of the ideal symmetrical
ar ‘““normal &gt;’ form
        <pb n="188" />
        <pb n="189" />
        Theory of Statistics.) [Zo face page 166.
oF
— I ls

le

| el
ve
*
o -
Ung o CL .
- =
Fie. 31.—Frequency Surface for the Rate of Discount and Ratio of Reserves to Deposits in American Banks (data of Table V.)
        <pb n="190" />
        <pb n="191" />
        IX.—CORRELATION.
re asymmetrical, the skewness being positive for the rows at
the top of the table (the mode being lower than the mean), and
negative for the rows at the foot, the more central rows being,
nearly symmetrical. The maximum frequency lies towards th
upper end of the table in the compartment under the row an
olumn headed “30-”. The frequency falls off very rapidly,
wards the lower ages, and slowly in the direction of old age.
utside these two forms, it seems impossible to delimit empirically;
ny simple types. Tables V. and VI. are given simply as illus-
rations of two very divergent forms. Fig. 31 gives a graphical
representation of the former by the method corresponding to the
histogram of Chapter VI., the frequency in each compartment
eing represented by a square piliar. The distribution o
requency is very characteristic, and quite different from that
of any of the Tables I., IIL, III, or IV.
6. It is clear that such tables may be treated by any of the
ethods discussed in Chapter V., which are applicable to al
ontingency-tables, however formed. The distribution may be
investigated in detail by such methods as those of § 4, or tested
or isotropy (§ 11), or the coefficient of contingency can be
calculated (§§ 5-8). In applying any of these methods, however,
it is desirable to use a coarser classification than is suited to the
methods to be presently discussed, and it is not necessary to
retain the constancy of the class-interval. The classification
should, on the contrary, be arranged simply with a view to avoidin
many scattered units or very small frequencies. A few examples
should be worked as exercises by the student (Question 3).
7. But the coefficient of contingency merely tells us whether,
nd if so, how closely, the two variables are related, and muc
more information than this can be obtained from the correlation-
ble, seeing that the measures of Chapters VII. and VIII. can be
pplied to the arrays as well as to the total distributions. If the
wo variables are independent, the distributions of all paralle
rrays are similar (Chap. V. § 13); hence their averages an
ispersions, e.g. means and standard deviations, must be the same,
n general they are not the same, and the relation between the
mean or standard deviation of the array and its type require
investigation. Of the two constants, the mean is, in general, the
more important, and our attention will for the present be con-
fined to it. The majority of the questions of practical statistic
relate solely to averages: the most important and fundamental
question is whether, on an average, high values of the one variable
show any tendency to be associated with high (or with low)
values of the other. If possible, we also desire to know how great
divergence of the one variable from its average value is associate
        <pb n="192" />
        * THEORY OF STATISTICS.
with a unit divergence of the other, and to obtain some idea as to
the closeness with which this relation is usually fulfilled.

8. Suppose a diagram (fig. 32) to be drawn representing the
values of means of arrays. Let OX, OY be the scales of the two
variables, v.e. the scales at the head and side of the table, 01, 12,
etc., being successive class-intervals. Let J/; be the mean value
of X, and M, the mean value of ¥. If the two variables be
absolutely independent, the distributions of frequency in all
parallel arrays are similar (Chap. V. § 13), and the means of arrays
must lie on the vertical and horizontal lines JM, M,M, the

fx : 2 hoa 5 6X
Fig. 32.
small circles denoting means of rows and the small crosses means
of columns. (In any actual case, of course, the means would not
lie so regularly, but, if the independence were almost complete,
would only fluctuate slightly to the one side and the other of the
two lines.)

The cases with which the experimentalist, e.g. the chemist or
physicist, has to deal, where the observations are all crowded
closely round a single line, lie at the opposite extreme from
independence. The entries fall into a few compartments only of
each array, and the means of rows and of columns lie approximately
on one and the same curve, like the line ZR of fig. 33.

The ordinary cases of statistics are intermediate between these
two extremes, the lines of means being neither at right angles as

168
        <pb n="193" />
        IX.—CORRELATION. *3
in fig. 32, nor coincident as in fig. 33, but standing at an acute
angle with one another as ZR (means of rows) and CC (means of
columns) in figs. 36-8. The complete problem of the statistician,
like that of the physicist, is to find formule or equations which
will suffice to describe approximately these curves.

9. In the general case this may be a difficult problem, but, in
the first place, it often suffices, as already pointed out, to know
merely whether on an average high values of the one variable
show any tendency to be associated with high or with low values
of the other, a purpose which will be served very fairly by fitting a

Fre. 33.
straight line ; and further, in a large number of cases, it is found
either (1) that the means of arrays lie very approximately round
straight lines, or (2) that they lie so irregularly (possibly owing
only to paucity of observations) that the real nature of the curve
is not clearly indicated, and a straight line will do almost as well
as any more elaborate curve. (Cf. figs. 36-38.) In such cases
—and they are relatively more frequent than might be supposed
—the fitting of straight lines to the means of arrays determines
all the most important characters of the distribution. We might
fit such lines by a simple graphical method, plotting the points
representing means of arrays on a diagram like those of figures
36-38, and “fitting ” lines to them, say, by means of a stretched
black thread shifted about til] it appeared to run as near as

1H
        <pb n="194" />
        170 THEORY OF STATISTICS.
might be to all the points. But sucha method is hardly satis-
factory, more especially if the points are somewhat scattered ; it
leaves too much room for guesswork, and different observers obtain
very different results. Some method is clearly required which
will enable the observer to determine equations to the two lines
for a given distribution, however irregularly the means may lie,
as simply and definitely as he can calculate the means and
standard deviations.
10. Consider the simplest case in which the means of rows lie
CES
»
3d
Fic. 34.
exactly on a straight line ER (fig. 34). Let JM, be the mean
value of ¥, and let RR cut Myx, the horizontal through M/,, in A.
Then it may be shown that the vertical through J/ must cut OX
in M,, the mean of X. For, let the slope of RR to the vertical,
i.e. the tangent of the angle MMR or ratio of 4l to IM, be b,,
and let deviations from My, Mx be denoted by » and y. Then for
any one row of type y in which the number of observations is n,
S (x) =n.byy, and therefore for the whole table, since 2(ny)=0,
S(x)=b,3(ny) = 0. 1M; must therefore be the mean of X, and
JI may accordingly be termed the mean of the whole distribution.
Knowing that RE passes through M, it remains only to determine

@®
LC
        <pb n="195" />
        IX.— CORRELATION. 171
b,. This may conveniently be done in terms of the mean product
p of all pairs of associated deviations x and vz, 7.e.—
1
p=33() . a)
For any one row we have
(xy) = y(x) =n.0,y"
Therefore for the whole table
3(zy) =b2(ny?) = Nb,.0%,
2
by = : (2)
Similarly, if C'C" be the line on which lie the means of columns
and b, its slope to the horizontal, »s/sif,
Pp
b=2, 3)
These two equations (2) and (3) are usually written in a
slightly different form. Let
yo ry . (4)
Then b= rez b= r’? 4
a, a,
Or we may write the equations to RR and CC —
=p Pd
w=rity y tle . {(B)
These equations may, of course, be expressed, if desired, in
terms of the absolute values of the variables X and ¥ instead of
the deviations x and ¥.

11. The meaning of the above expressions when the means of
rows and columns do not lie exactly on straight lines is very
readily obtained. If the values of x and b,.y be noted for all
pairs of associated deviations, we have for the sum of the
squares of the differences, giving &amp;, its value from (5),

3(z-b.y)?=N.o (1-1?) (7)
If &amp;, be given any other value, say (r+ 8), then
3(x — by.y)2= No X(1 - 2 + 82),

or
(9,
        <pb n="196" />
        172 THEORY OF STATISTICS.
This is necessarily greater than the value (7); hence 2(z-- b,y)?
has the lowest possible value when b; is put equal to ro,/o,.
Further, for any one row in which the number of observations
is n, the deviation of the mean of the row from RZ is d (fig. 35),
and the standard deviation is sp, 3(x — b,y)? = ns, + n.d’. There-
fore for the whole table,
3(o = by)? = 3(ns,2) + (nd).

But the first of the two sums on the right is unaffected by the

p=

2

Fie. 35.
slope or position of RR, hence, the left-hand side being a
minimum, the second sum on the right must be a minimum also.
That is to say, when b; ts put equal to r o/c, the sum of the squares
of the distances of the row-means from RR, each multiplied by the
corresponding frequency, is the lowest possible.
Similar theorems hold good, of course, with respect to the line

CC. 1If b, be given the value r y S(z — by)? is a minimum,

Ty
and also 3(n.e?) (fig. 35). Hence we may regard the equations (6)
as being, either (a) equations for estimating each individual z
from its associated y (and y from its associated z) in such a way
        <pb n="197" />
        IX.—CORRELATION. 173
as to make the sum of the squares of the errors of estimate the
least possible ; or (8) equations for estimating the mean of the 2's
associated with a given type of ¥ (and the mean of the ¥’s associated
with a given type of 2) in such a way as to make the sum of the
squares of the errors of estimate the least possible, when every
mean is counted once for each observation on which it is based.

a. ES

Fie. 36.— Correlation between Age of Husband and Age of Wife in England
and Wales (Table IL): means of rows shown by circles and means of

columns by crosses: r= + 0-91.
The lines represented by the two equations are thus, in a certain
natural sense, “lines of best fit ” to the two actual lines of means.
12. The constant » is of very great importance. It is evi-
dently a pure number, and its magnitude is unaffected by the
scales in which 2 and y are measured, for these scales will
affect the numerator and denominator of (4) to the same
extent. If the two variables are independent, r is zero, for b,
and b, are zero (cf. § 8). The sign is the sign of the mean
product p, and accordingly » is positive if large values of #

£0
on
27
        <pb n="198" />
        174 THEORY OF STATISTICS.
are associated with large values of y, and conversely (as in
Tables I.-IV.), negative if small values of « are associated with
large values of ¥ and conversely (as in Table V.). The numerical
value cannot exceed +1, for the sum of the series of squares
in equation (7) is then zero and the sum of a series of squares
cannot be negative. If r= +1, it follows that all the observed
pairs of deviations are subject to the relation x/y=o,/o,: this
Fathers stature
ol 61 -~ 66 68 i i2
we Ey
66
Cc
67 E
Sh
3
69
«
a)
R
S
“wv 77
73
75 -

Fig. 37.—Correlation between Stature of Father and Stature of Son (Table
III.) : means of rows shown by circles and means of columns by crosses :
r= +0°51.

would be the case if the circles and crosses in such a diagram as

fig. 33 all lay on one and the same straight line. From these

properties 7 is termed the coefficient of correlation, and the

expression (4), 7 =p/o,0, =3(zy)/N.0.0 should be remembered.
It should be noted that, while r is zero if the variables are

independent, the converse is not necessarily true: the fact that

r is zero only implies that the means of rows and columns

lie scattered round two straight lines which do not exhibit

Te 2
        <pb n="199" />
        IX.—CORRELATION. 175
any definite trend, to right or to left, upward or downward.
Two variables for which = is zero are, however, conveniently
spoken of as uncorrelated. Table VI. and fig. 39 will serve as an
illustration of a case in which the variables are almost uncor-
related but by no means independent, » being very small (- 0'014),
but the coefficient of contingency C (for grouping of qu. 3) 0-47.

Figs. 36, 37, 38 are drawn from the data of Tables II., III., and
IV., for which » has the values +091, + 0-51, and + 0-21 respec-
tively, the correlation being positive in each case. The student

Num?™ -» Ff Mothers Children.
)
he
RR
F16. 38.—Correlation between number of a Mother's Children and number of
her Daughter’s Children (Table 1V.): means of rows shown by circles
and means of columns by crosses : #= + 0°21.
should study such tables and diagrams closely, and endeavour to
accustom himself to estimating the value of » from the general
appearance of the table.
13. The two quantities
0, a,
b, = i b, = tr
are termed the coefficients of regression, or simply the regressions,
b, being the regression of z on y, or deviation in z corresponding
on the average to a unit change in the type of y, and b, being

pm.
aL
        <pb n="200" />
        176 THEORY OF STATISTICS,
similarly the regression of » on x. Whilst the coefficient of
correlation is always a pure number, the regressions are only
pure numbers if the two variables have the same dimensions, as
in Tables I.-IV. : their magnitudes depend on the ratio of o,/o,, and
consequently on the units in which «# and y are measured. They
are both necessarily of the same sign (the sign of 7). Since r is
Proportion of Male births per 1000 births.
~ = 3
10,
5
2
3 20,
&lt; 10
S
Fre. 39. Correlation between births in a Registration District and Propor-
tion of Male Births per thousand of all births (England and Wales,
1881-90, Table VI.): means of rows shown by circles and means
of columns by crosses : r= — 0014.
not greater than unity, one at least of the regressions must be
not greater than unity, but the other may be considerably greater
if the ratio a,/o, or o,/o, be great. The name regression arose
from the term being first introduced in the case of inheritance of
stature (Galton, refs. 2, 3). In this case the two standard devia-
tions are very nearly equal, so that both 5, and &amp;, are less than
unity, say (using the more recent data of Table III.) 0-50 and 0-52.
        <pb n="201" />
        IX.— CORRELATION. 177
Hence the sons of fathers of deviation « from the mean of all fathers
have an average deviation of only 0-522 from the mean of all sons ;
ve. they step back or “regress” towards the general mean, and 0-52
may be termed the “ratio of regression.” In general, however,
the idea of a “stepping back” or “regression” towards a more
or less stationary mean is quite inapplicable—obviously so where
the variables are different in kind, as in Tables V. and VI.—
and the term “ coeflicient of regression” should be regarded simply
as a convenient name for the coefficients 4, and 6,, RR and CC
are generally termed the “lines of regression,” and equations (6)
the “regression equations.” The expressions “ characteristic lines,”
““ characteristic equations” (Yule, ref. 8) would perhaps be better.
Where the actual means of arrays appear to be given, to a satis-
factory degree of approximation, by straight lines, we may say
that the regression is linear. It is not safe, however, to assume
that such linearity extends beyond the limits of observation.

14. The two standard deviations

8,=0, n/1-12 8,=0, 1-12
are of considerable importance. It follows from (7) that s, is the
standard deviation of (z-6,.7), and similarly s, is the standard
deviation of (y — b,x). Hence we may regard s, and s, as the
standard errors (root mean square errors) made in estimating «
from y and y from « by the respective characteristic relations
x=05.y y =bya.

s, may also be regarded as a kind of average standard deviation of
a row about RE, and s, as an average standard deviation of a
column about CC. In an ideal case, where the regression is
truly linear and the standard deviations of all parallel arrays are
equal, a case to which the distribution of Table III. is a rough
approximation, s, is the standard deviation of the z-array and s,
the standard deviation of the y-array (cf. Chap. X. § 19 (3)).
Hence s, and s, are sometimes termed the “standard deviations
of arrays.”

15. Proceeding now to the arithmetical work, the only new
expression that has to be calculated in order to determine 7 5,, 0,
$x» and s, is the product sum 3(zy) or the mean product p. Asin
the cases of means and standard deviations, the form of the
arithmetic is slightly different according as the observations are
few and ungrouped, or sufficient to justify the formation of a
correlation-table. In the first case, as in Example i. below, the
work is quite straightforward.

Ezample i., Table VII.—The variables are (1) X—the estimated
12
        <pb n="202" />
        Bs THEORY OF STATISTICS.
TaBLE VII. THEORY OF CORRELATION: Example i.
2. 3. 4, b. 6. ST. 8. 9.
X. Y. , = v. Products zy.
Estimated |Percent-| _
Breage oe of |
arnings Popula-
of Ant tion in Dv Devia-
nion. cultural receipt . ion 0 2 :
Labourers. of | Sm yfrom | = ° | Ye bos Noga
Shillings ! Poor- | p = ) Mean. ve. Ive.
and Pence ; law i ¢ Shs)
per Week. Relief.
er ree [ree aes men A ee _
5. ails
1. Glendale . . 20 9 2:40 +58 -1-27 8364 | 1'6129 ne 73:66
2. Wicton i, a, 20 3 229 +52 -1-38 2704 | 19044 —_ 71-76
Sa Garstang 1088 | 1-39 +45 - 2:28 2025 | 51984 — 10260
Belper  . . | is 6 | 192 | +431 | -175; 961 | 30626 — 54°25
Nantwich . 7 8 2:98 +21 -0'69 441 04761 —_ | 14°49
Atcham . . ZENE 1:17 +19 —-250 361 62500 47:50
Driffield . v 7980: 379 i +14 +012 196 0°0144 1:68 ~
Uttoxeter . fy 3:01 +13 - 0°66 169 . 04356 — 8'58
Wetherby 7-0 2:39 | +13 -128 169 ' 1 6384 - 16:64
* . Easingwold . 6 11 2°78 +12 -0°89 144  0°7921 — 10°68
Southwell CHE 6 6 3:09 + 7 -058 49 0-3364 —_ | 4:06
. Hollingbourn . 16 4 278 + 5 -0'89 25 07921 — 4:45
Melton Mowbray 6 3 2:61 + 4 -1°06 16 | 1:1236 = 4:24
+. Truro SET [6 33 4:33 + 4 +066 16 1 04356 2°64
+3. Godstone : 5 0 3:02 +1 -0'65 1 1 04225 0°66
13. Louth ¥ . i 0 420 +1 +053 1 | 0°2809 053 -
/. Brixworth » J 1-29 - 2 - 2-38 4 5'6644 476 —
». Crediton 4 vod 5:16 - 3 4149! 9 | 2:2201 = 4:47
+. Holbeach , . . 3 475 - 5 +108 | 25 | 1-1664 — 540
~). Maldon . : » 6 4:64 - 5 +097 ! 25 09409 i — BEN E485
21. Monmouth . y 4 4°26 - +059 | 49 | 0°3481 - 4°13
22. St Neots . ; 3 166 -8 | -201 64 | 4-0401 | 16:08
23. Swaffham . SRO 5:37 — 11 +170 -121 | 2:8900 18-70
24 Thakeham o CH) 3 38 -11 = 0°29 121 00841 3°19 -
25. Thame . . Ih 0 5°84 -11 +217 121 47089 — 23:87
26. Thingoe . e EY) 4°63 -11 | 40°96 121, 0°9216 — 10°56
927. Basingstoke . 50 3:93 -11 +0°26 121 | 00676 0-36
28. Cirencester . 15 0 4°54 -11 +6°87 ! 121 07569 957
29. North Witchford 14 10 3:42 -13 -025 | 169 0°0625 3:25
30. Pewsey  . . 14 9 5:88 -14 +221 196 4-8841 — 30'94
»1. Bromyard : it 9 4:36 -14 | 4069 196 | 0°4761 — 9°66
32. Wantage . . 14 9 3:85 -14 +018 196 | 0°0324 — 2:62
33. Stratford on Avon! 14 7 3:92 -16 . 40°25 256 00625 — 4:00
34. Dorchester . 14 6 4-48 —17 |! +081 289 06561 -— 13°77
35. Woburn . 14 6 | B67 ; -17 | +2:00' 289 | 0000 ~~. 34:00
36. Buntingford . 14 4 I 401 -19 +124 361 15376 23:66
37. Pershore . . 1376 4°34 -29 40°67 841 04489 1943
38. Langport . . 12° 6 519 St] +152 1681 23104 62°32
Mean Mean | 16,018 | 6305561 32:13 69817
15 11 3:67 32°13
Ta Iy ee .
205d. | 1-297 S(ay)= -666:04

{8
        <pb n="203" />
        IX.—CORRELATION. 179
average weekly earnings of agricultural labourers in 38 English
Poor-law unions of an agricultural type (the data of Example i.,
Chap. VIIL p. 137). (2) Y—the percentage of the population
in receipt of Poor-law relief on the Ist January 1891 in each of the
same unions (B return). The means of each of the variables are
calculated in the ordinary way, and then the deviations z and y
from the mean are written down (columns 4 and 5): care must
be taken to give each deviation the correct sign. These deviations
are then squared (columns 6 and 7) and the standard deviations
found as before (Chap. VIII. p. 136). Tinally, every a is
multiplied by the associated y and the product entered in column
8 or column 9 according to its sign. These columns are then
added up separately and the algebraic sum of the totals gives
3(ry)= — 66604: therefore the mean product p=3(zy)/N= -
17-53, and

17-53
T= 05x13 6
There is therefore a well-marked relation exhibited by these data
between the earnings of agricultural labourers in a district and
the percentage of the population in receipt of Poor-law relief.
A penny is rather a small unit in which to measure deviations in
the average earnings, so for the regressions we may alter the unit
of to a shilling, making o,= 171, and
b=r2= 087, &amp;,=rZ=_050.
Ty Ox
The regression equations are therefore, in terms of these units,
z= -08T7y y= - 0-502.
For practical purposes it is more convenient to express the
equations in terms of the absolute values of the variables rather
than the deviations: therefore, replacing « by (X - 1594) and y
by (¥ - 367) and simplifying, we have
X=1913-087Y . fa)
Y=1164 - 050X ©)
the units being 1s. for the earnings and 1 per cent. for the
pauperism. The standard errors made in using these equations
to estimate earnings from pauperism and pauperism from earnings
respectively are
a, M1 —12=154d. = 1-285.
a, NT =r2= 0:97 per cent.
        <pb n="204" />
        THEORY OF STATISTICS.

The equation (8) tells us therefore that a rise of 2s. in earnings
in passing from one district to another means on the average a
fall of 1 in the percentage in receipt of relief. A natural con-
clusion would be that this means a direct effect of the higher
earnings in diminishing the necessity for relief, but such a
conclusion cannot be accepted offhand. Equation (@) indicates,
for instance, that every rise of a unit in the percentage re-
lieved corresponds to a fall of 0-87 shillings, or 101d. in earnings:
this might mean that the giving of relief tends to depress wages.
Which is the correct interpretation of the facts? The above

3

3 12 75 Rr! 5 ig 77 18 r 79 . 20 12!

3 A

3

fx 2

S

$1

N [7 J

12 Pe . ar 21
Average weekly earnings of Agriculiural Labourers.

F16. 40.—Correlation between Pauperism and Average Earnings of Agricultural

Labourers for certain districts of England (data of Table VII.) : RR,

CC, lines of regression : r= — 0°66.
regression equations alone cannot tell us this, and it is in the
discussion of such questions that most of the difficulties of statisti-
cal arguments arise.

As a check on the whole of the arithmetical work, and to test
whether the correlation coefficient is unduly affected by a few out-
lying observations, or, perhaps, by the regression not being linear,
it is always as well to draw a diagram representing the results
obtained. Take scales along two axes at right angles (fig. 40)
representing the variables, and insert a dot (better, for clearness,
a small circle or a cross) at the point determined by each observed
pairof X and ¥. Complete the diagram by inserting the two lines

180
        <pb n="205" />
        IX.—CORRELATION.
RR and CC given by the regression equations (a) and (3). In
doing this it is as well to determine a point at each end of both
lines, and then to check the work by seeing that they meet in the
mean of the whole distribution. Thus RR is determined from (a)
by the points ¥Y=0, X=19-13 and Y=6, X=1391: CC is
determined from (%) by the points X=12, ¥=564 and X=21,
¥ =114. Marking in these points, and drawing the lines, they
will be found to meet in the mean, X=15'94, Y=367. The
diagram gives a very clear idea of the distribution ; clearly the
regression is as nearly linear as may be with so very scattered a
distribution, and there are no very exceptional observations. The
most exceptional districts are Brixworth and St Neots with rather
low earnings but very low pauperism, and Glendale and Wigton
with the highest earnings but a pauperism well above the lowest—
over 2 per cent.

16. When a classified correlation-table is to be dealt with, the
procedure is of precisely the same kind as was used in the calcula-
tion of a standard deviation, the same artifices being used to shorten
the work. That is to say, (1) the product-sum is calculated in the
first instance with respect to an arbitrary origin, and is afterwards
reduced to the value it would have with respect to the mean; (2
the arbitrary origin is taken at the centre of a class-interval ; (3
the class-interval is treated as the unit of measurement throughout
the arithmetic.

Let deviations from the arbitrary origin be denoted by £7, and
let £7 be the co-ordinates of the mean. Then

¢=z +E g=y+7
- En=xy+ Ey +e + Gi.

Therefore, summing, since the second and third sums on the

right vanish, being the sums of deviations from the mean,

(én) = Z(xy) + NE7,
or bringing 2(zy) to the left,

(wy) = 3(&amp;) - Ne3.
That is, in terms of mean-products, using »’ to denote the mean-
product for the arbitrary origin,

r=p -5&amp;.
In any case where the origin from which deviations have been

measured is not the mean, this correction must be used. It will
sometimes give a sensible correction even for work in the form of

18,
        <pb n="206" />
        1 THEORY OF STATISTICS.
Example i., and in that case, of course, the standard deviations
will also require reduction to the mean.

As the arithmetical process of calculating the correlation co-
efficient from a grouped table is of great importance, we give two
illustrations, the first economic, the second biological.

Example ii., Table VIII.—The two variables are (1) X, the
percentage of males over 65 years of age in receipt of Poor-law
relief in 235 unions of a mainly rural character in England and
Wales ; (2) Y, the ratio of the numbers of persons given relief  out-
doors” (in their own homes) to one “indoors” (in the workhouse).
The figures refer to a one-day count (Ist August 1890, No. 36,
1890), and the table is one of a series that were drawn up with
the view to discussing the influence of administrative methods on
pauperism. (Economic Journal, vol. vi., 1896, p. 613.)

The arbitrary origin for X was taken at the centre of the fourth
column, or at 17'5 per cent. ; for ¥ at the centre of the fourth
row, or 3-5. The following are the values found for the constants
of the single distributions :—

£= - 01532 intervals= — 0"77 per cent., whence J, =
16-73 per cent.

o,=1'29 intervals = 6-45 per cent.

7j= + 0°36 intervals or units, whence J, = 3-86.

0, =2'98 units.

To calculate 3(é7), the value of &amp; is first written in every
compartment of the table against the corresponding frequency,
treating the class-interval as the unit: these are the figures in
heavy type in Table VIII. In making these entries the sign of
the product may be neglected, but it must be remembered that
this sign will be positive in the upper left-hand and lower right-
hand quadrants, negative in the two others. The frequencies are
then collected as shown in columns 2 and 3 of Table VIIIa.,
being grouped according to the value and sign of é&amp;y. Thus for
én=1, the total frequency in the positive quadrants is 13+ 85
= 215, in the negative 14+6=20: for &amp;=2, 10+45+1+45
=20 in the positive quadrants, 5+2+1+35=11'56 in the
negative, and so on. When columns 2 and 3 are completed, they
should first of all be checked to see that no frequency has been
dropped, which may be readily done by adding together the totals
of these two columns together with the frequency in row 4 and
column 4 of Table VIII. (the row and column for which én=0),
being careful not to count twice the frequency in the compartment
common to the two; this grand total must clearly be equal to the
total number of observations &amp;, or 235 in the present case. The

algebraic sum of the frequencies in each line of columns 2 and 3 is

+892
        <pb n="207" />
        IX.—CORRELATION. 183
TaBLE VIII THEORY OF CORRELATION : Example ii.—O0ld-age Pauperism and
Proportion of Out-relief. (The Frequencies are the figures printed in ordi-
nary type. The numbers in heavy type are the Deviation- Products (&amp;n).)
Number Percentage of Males over 65 in receipt of Relief.
relieved
Outdoors — Total.
Sond ¢ 5. | gio. | 10-15. { 15-20. § 20-25. | 25-30. | 30-35. | 85-40. 1
ree mi 4 a | (naif
0s” is ¢ ‘ HE | - 10 17°5
: Y = = | 12
in | 12: jun 5 | — | 455
i or &amp; : n =
oe | rn fen faa | gy | — | 480
2 i : A : 1 2 ; — : -
Cn 140 | 140 | 30 |. = | 440
| b Jo | EF
i 115 85 10 i —. E280
TE 1 2 . S|
i = — TT = Ui
Pure an | 45 20 WIR — IP —, Ti 130
“ ‘ 2 4 - ._ 2
a: is 2-0 40 (INI | 110
&gt; o | 3 6 : % ns
——— [om - sm — | ct— —
0 1) 11 | 30 0 | 55
3 4 ' 4 —_— :
05 | 1 : | 18 I ‘aly I
10 | 5 hs | | EPR
1:0 4 o.n a i" . 7-0
3 | ME Be SCI 5 I HY
11 12 PY 3 i 20
12-13 | R= } 10
13-14 | : ely
14-15 - _—
15-16 { 20
16-17 -
17-18 { 2
18-19 | i
Totals | ¢ . EN ou 650 | 59°C 180 | 1: . 2350
Percentage in receipt of Relief |. . Mean 16°73 per cent. a, 6°45 per cent.
Ovt-relief Ratio . Mean 3-86. a, 2°98.
        <pb n="208" />
        ot THEORY OF STATISTICS.
TABLE VIIIA. CALCULATION OF THE PropUCT SUM I(&amp;7).
2. 2 5. i
Frequencies Products.
Total.
En. + my td
Quadrants. Quadrants. Positive. Negative
215 20 15 1:5 Se
20 15 +-48:5 17 —
12 | Y +10 30 =
18 ! 1 +17 68
17°5 I +165 | 99
z 05 INS 12
15 1 +5015 45
0 4 05 + 35 35
32 = 2 32 ai
19 i == =i 15
2) Chel ==
24 + 24
28 28
Totals 100°5 fr +334 ‘
415 — 44
93 si
SR +280
ors
then entered in column 4, treating the frequencies in column 3 as if
they were themselves negative, and finally the figures of column 4
are multiplied by the values of &amp; and the products entered in
column 5 or 6 according to sign. The algebraic sum of the totals
of columns 5 and 6 = + 290 =3(£y7). Whence p’ =3(&amp;y)/N =1234,
To find the value of p we have, remembering that we are working
with class-intervals as the unit,
&amp;j= — (0-153 x 0-36) = — 0-055
p=p —&amp;=123440055= + 1-289
1-289
T= OT wo98= T 0-34.
The regression of pauperism on out-relief ratio is, reverting to
1 per cent. as the unit of pauperism instead of the class-interval,

184
. 41H — 44
OL
        <pb n="209" />
        IX. —CORRELATION. 123
+034 x 645/298 = 0-74, and the regression equation accordingly
z=0'74y, or

X=139+0747Y,
the standard error made in using the equation for estimating X
from Y being 0, o/T = 72= 6-07.

This is the equation of greatest practical interest, telling us
that, as we pass from one district to another, a rise of 1 in the
ratio of the numbers relieved in their own homes to the numbers
relieved in the workhouse corresponds on an average to a rise of
0-74 in the percentage in receipt of relief. The result is such as
to create a presumption in favour of the view that the giving of
out-relief tends to increase the numbers relieved, and this can be
taken as a working hypothesis for further investigation.

The student should work out the second regression equation,
and check both by calculating the means of the principal rows
and columns, and drawing a diagram like figs. 36, 37, and 38.

Example iii., Table IX.—(Unpublished data ; measurements by
G. U. Yule.) The two variables are (1) X, the length of a mother-
frond of duckweed (Lemna minor); (2) Y, the length of the
daughter-frond. The motherfrond was measured when the
daughter-frond separated from it, and the daughter-frond when
its first daughter-frond separated. Measures were taken from
camera drawings made with the Zeiss-Abbé camera under a low
power, the actual magnification being 24 : 1. The units of length
in the tabulated measurements are millimetres on the drawings.

The arbitrary origin for both X and ¥ was taken at 105 mm.
The following are the values found for the constants of the single
distributions :—

§= -1'058 intervals= — 63 mm. M,= 98'7 mm. on drawing.

= 4°11 mm. actual.
oz= 2°'828 intervals= 17'0 mm. on drawing= 0°707 mm. actual.

#=-0208 , =- 12mm. My=103"8 mm. on drawing.

= 4°32 mm. actual.

oy= 3°08¢ , = 185mm. ondrawing= 0771 mm. actual,

The values of &amp; are entered in every compartment. of the
table as before, and the frequencies then collected, according to
the magnitude and sign of &amp;n, in columns 2 and 3 of Table IXa.
The entries in these two columns are next checked by adding to
the totals the frequency in the row and column for which &amp;, is
zero, and seeing that it gives the total number of observations
(266). The numbers in column 4 are given by deducting the
entries in column 3 from those in column 2. The totals so
obtained are multiplied by &amp; (column 1) and the products entered

RF
        <pb n="210" />
        THEORY OF STATISTICS.
TABLE IXA.
’ 2. 3. 5. 6.
Frequencies. Products.
&gt;a, Total. =
+
Quadrants. Quadrants.
— 85 —85 85
17 13'5 + 35 : i
105 9 +18 45 =
185 | 65 Soli 28 or
2 05 +15 ss
135 ' +85 51
13 +12 96
9 + 5 45
65 + 55 55
175 +175 210 :
5 El 14 :
Tr #00 90 --
£ +47 112 --
+12 36 ~
: i 160 2%
: + 42
: = 144
Co i 25
13 - » 28
30 == La an =
36 -— zo on .
40 ; - + 1] 40 :
42 = - 2 84
60 sy ta] 60
63 Lo 63
Totals 1455 +1528 -"5
49 == 85
71:5
wl 1 5
in column 5 or 6 according to sign. The algebraic sum of the
totals of these two columns gives 3(fy)= + 15195. Dividing
by 266, p'=5'712. But f= +1058 x 0:203= +0215; there-
fore p=5"712 — 0-215 = 5-497.
5-497
=4-—— —— = +063.
T= SE 00L ©

186
[4]
49 -§
1519
266
        <pb n="211" />
        Theory of Statistics.) (Z' face page 186.
TaBLE IX. THEORY oF CORRELATION : Illustration iii.—Correlation between (1) length of mother-frond, (2) length of
daughlter-frond, in Lemna minor. -[Unpublished data ; G. U. Yule.] (The frequencies are the figures printed in ordinary
type. The numbers in heavy type are the deviation-products (&amp;m)).
(1) Length of mother-frond (mm. of camera drawing enlarged 24 : 1).
cle lglglele| ge F.C Y|E|E|E|E E|ElErg
60-66 [pl a k E CASE = | —
66 i 3
I res’
ne = 95
wt | 2°
aT 2] IEE 275
: s | COE =
- a - 1 oe —
ar y 65 | 45 | 4 ; 38'5
6 4 2 2 =
L | 4'5 7 245
3 2 ne
1c Sg | us | 5 | 35 | Co 89
= 9) 0 0 a 0 hi
Le = = a Us A . 26
10¢ wo FWP EIR 5 2
1 : ~ les | 65 A= 22
1 + = 31%] Foi Poicdez
&gt; 1% J PR 23 &gt; | l= fae
Soom oe
126-4 o | A atic =
0 lls ot] an 2 NTT Ea
182128 = Fe » nr 0 - ES 5
moo le BE Ey HE a]
138-14. =f = gE oul 15 " — | an oe
—— se mem i ments
i
144-150 = | = | = 8
160-150 A ribald
jo - oo x =
oto { — = = | = (fie=: SRau ~~ I~ iii 1 RI
! Co T=
162-168 { ge
otal 8 105) 346 | 865 wo. 2564 4150 22. ot . 5 2 : 266
Lo
        <pb n="212" />
        <pb n="213" />
        IX.—CORRELATION. 17

The regression of daughter-frond on mother-frond is 0:69 (a
value which will not be altered by altering the units of measure-
ment for both mother- and daughter-fronds, as such an alteration
will affect both standard deviations equally). Hence the re-
gression equation giving the average actual length (in millimetres)
of daughter-fronds for mother-fronds of actual length X is

Y=148+069X.

We again leave it to the student to work out the second
regression equation giving the average length of mother-fronds
for daughter-fronds of length ¥, and to check the whole work
by a diagram showing the lines of regression and the means of
arrays for the central portion of the table.

17. The student should be careful to remember the following
points in working: —

(1) To give p" and &amp; their correct signs in finding the true
mean deviation-product p.

(2) To express o, and 0, in terms of the class-interval as a
unit, in the value of »=p/o, o,, for these are the units in terms
of which p has been calculated.

(3) To use the proper units for the standard deviations (not
class-intervals in general) in calculating the coefficients of
regression : in forming the regression equation in terms of the
absolute values of the variables, for example, as above, the work
will be wrong unless means and standard deviations are ex-
pressed in the same units.

Further, it must always be remembered that correlation
coefficients, like all other statistical measures, are subject to
fuctuations of sampling (¢f. Chap. IIL § 7, 8). If we write
on cards a series of pairs of strictly independent values of z and
y and then work out the correlation coefficient for samples of,
say, 40 or 50 cards taken at random, we are very unlikely ever
to find r=0 absolutely, but will find a series of positive and
negative values centring round 0. No great stress can therefore
be laid on small, or even on moderately large, values of » as
indicating a true correlation if the numbers of observations be
small. For instance, if ¥=236, a value of r= +05 may be
merely a chance result (though a very infrequent one); if
N=100, r= +03 may similarly be a mere fluctuation of
sampling, though again an infrequent one. If NN =900, a value
of 7= #01 might occur as a fluctuation of sampling of the same
degree of infrequency. The student must therefore be careful in
interpreting his coefficients. (See Chap. XVII. § 15.)

Finally, it should be borne in mind that any coefficient, e.g. the
coefficient of correlation or the coefficient of contingency, gives

S-
        <pb n="214" />
        . THEORY OF STATISTICS.

only a part of the information afforded by the original data or

the correlation table. The correlation table itself, or the original

data if no correlation table has been compiled, should always be
given, unless considerations of space or of expense absolutely
preclude the adoption of such a course.

REFERENCES.

The theory of correlation was first developed on definite assumptions
as to the form of the distribution of frequency, the so-called ‘‘ normal
distribution ” (Chap. XVI.) being assumed. In (1) Bravais introduced
the product-sum, but not a single symbol for a coefficient of correlation.
Sir Francis Galton, in (2), (3), and (4), developed the practical method,
determining his coefficient (Galton’s function, as it was termed at first)
graphically. Edgeworth developed the theoretical side further in (5),
and Pearson introduced the product-sum formula in (6)—both memoirs
being written on the assumption of a “normal” distribution of fre-
quency (¢f. Chap. XVI.). The method used in the preceding chapter
is based on (7) and (8).

(1) BRAVATS, A., ‘‘ Analyse mathématique sur les probabilités des erreurs de
situation d’un point,” Acad. des Sciences : Mémoires présentés par divers
savants, 11, série, t. ix., 1846, p. 255.

(2) Garton, FrANcis, ‘‘ Regression towards Mediocrity in Hereditary
Stature,” Jour. Anthrop. Inst., vol. xv., 1886, p. 246.

(8) GarToN, FrANcis, ‘‘ Family Likeness in Stature,” Proc. Roy. Soc.,
vol. x1., 1886, p. 42.

(4) Garton, Francis, ‘Correlations and their Measurement,” Proc. Roy.
Soc., vol. xlv., 1888, p. 135.

(5) EpcEworTH, F. Y., “On Correlated Averages,” Phil. Mag., 5th Series,
vol. xxxiv., 1392, p- 190.

(6) PEarsoN, KARL, ‘Regression, Heredity, and Panmixia,” Phil. Trans.
Roy. Soc., Series A; vol. clxxxvii., 1896, p. 253.

(7) YuLg, G. U., “On the significance of Bravais’ Formule for Regression,
etc., in the case of Skew Correlation,” Proc. Roy. Soc., vol. 1x., 1897,
pr 477.

(8) Yur, G. U., “On the Theory of Correlation,” Jour. Roy. Stat. Soc.,
vol. 1x., 1897, p. 812.

(9) DarpISHIRE, A. D., “Some Tables for illustrating Statistical Correla-
tion,” Mem. and Proc. of the Manchester Lit. and Phil. Soc., vol. li.,
1907. (Tables and diagrams illustrating the meaning of values of the
correlation coefficient from 0 to 1 by steps of a twelfth.)

Reference may also be made here to—

(10) EpcewortH, F. Y., “On a New Method of reducing Observations
relating to several Quantities,” Phil. Mag., 5th Series, vol. xxiv., 1887,
p- 222, and vol. xxv., 1888, p. 184. (A method of treating correlated
variables differing entirely from that described in the preceding
chapter, and based on the use of the median: the method involves
the use of trial and error to some extent. For some illustrations see
F. Y. Edgeworth and A. L. Bowley, Jour. Roy. Stat. Soc., vol. 1xv.,
1902, p. 341 et seq.)

References to memoirs on the theory of non-linear regression are given
at the end of Chapter X.

188
        <pb n="215" />
        IX.—CORRELATION.
EXERCISES.

1. Find the correlation-coefficient and the equations of regression for the
following values of X and ¥.

[As 4 matter of practice it is never worth calculating a correlation-coeflicient
for so few observations: the figures are given solely as a short example on
which the student can test his knowledge of the work. ]

2. The following figures show, for the districts of Example i., the ratios of
the numbers of paupers in receipt of outdoor relief to the numbers in receipt
of relief in the workhouse. Find the correlations between the out-relief ratio
and (1) the estimated earnings of agricultural labourers; (2) the percentage
of the population in receipt of relief.

6°40 14 750 27 2:97
4°04 15 4-44 28 538
: 7:90 16 834 29 3-24
3-31 17 0°69 30 7-61
785 i 9-89 a 5-87
045 i 4-00 &amp;2 550
10-00 Zu 602 ao 3:58
4°43 2. 827 el 6-93
478 22 158 Ly 6°02
1 4-73 23 1604 36 4-92
1 6°66 24 1-96 37 4°64
1 1:22 25 9-28 38 10°56
13 427 26 872
3. Verify the following data for the under-mentioned tables of the preceding
chapter. Calculate the means of rows and columns and draw diagrams showing
the lines of regression, as figs. 36-39. for one or two cases at least.
vo 1: VI
Mean of X . - 00 mm, 4° years 6770ins. 5b 509-2
aly r , 53.0, Barz? "Vege CL. 14,500
tandar evia- 2 . .
tion of X : g ? i » | 272, g 7:46
Standard devia- 2

tion of 7. : : - » Es.» 18,100
ries eed © en © nl enn
Coefficient of con- - -

tingency (for the o

grouping stated : =

below) . 3

189
0°90 0°81 051 031 17
        <pb n="216" />
        * THEORY OF STATISTICS.

In calculating the coefficient of contingency (coefficient of mean square
contingency) use the following groupings, so as to avoid small scattered fre-
quencies at the extremities of the tables and also excessive arithmetic :—

I. Group together (1) two top rows, (2) three bottom rows, (3) two first
columns, (4) four last columns, leaving centre of table as it stands.

II. Regroup by ten-year intervals (15-, 25-, 35-, ete.) for both husband and
wife, making the last group ‘65 and over.”

III. Regroup by 2-inch intervals, 585-605, etc., for father, 59:5-61°5,
ete., for son. Ifa 3-inch grouping be used (585-615, ete., for both father and
son), the coefficient of mean square contingency is0'465. [Both results cited
from Pearson, ref. 1 of Chap. V.]

IV. For cols., group 1+2, 3+4, . . . , 11412, 13 and upwards. Rows,
0,1+2,3+4,..., 9+10, 11 and upwards.

VI. For cols., group all up to 494'5 and all over 5215, leaving central cols.
Rows singly up 20 : then 20-28, 28-44, 44-56, 56 upwards.

190
        <pb n="217" />
        CHAPTER X.
CORRELATION: ILLUSTRATIONS AND PRACTICAL
METHODS.
L. Necessity for careful choice of variables before proceeding to calculate r—
2-8. Illustration i: Causation of pauperism—9-10. Illustration
ii.: Inheritance of fertility—11-13. Illustration iii.: The weather
and the crops—14. Correlation between the movements of two
variables :—(a) Non-periodic movements: Illustration iv.: Changes
in infantile and general mortality—15-17. (3) Quasi-periodic move-
ments : Illustration v.: The marriage-rate and foreign trade—
18. Elementary methods of dealing with cases of non-linear regression
—19. Certain rough methods of approximating to the correlation
coefficient—20-22. The correlation ratio,
1. Tae student—especially the student of economic statistics, to
whom this chapter is principally addressed—should be careful to
note that the coefficient of correlation, like an average or a
measure of dispersion, only exhibits in a summary and compre-
hensible form one particular aspect of the facts on which it is
based, and the real difficulties arise in the interpretation of the
coefficient when obtained. The value of the coefficient may be
consistent with some given hypothesis, but it may be equally
consistent with others; and not only are care and judgment
essential for the discussion of such possible hypotheses, but also
a thorough knowledge of the facts in all other possible aspects.
Further, care should be exercised from the commencement in the
selection of the variables between which the correlation shall be
determined. The variables should be defined in such a way as
to render the correlations as readily interpretable as possible,
and, if several are to be dealt with, they should afford the answers
to specific and definite questions. Unfortunately, the field of
choice is frequently very much limited, by deficiencies in the
available data and so forth, and consequently practical possibilities
as well as ideal requirements have to be taken into account. No
general rules can be laid down, but the following are given as
illustrations of the sort of points that have to be considered.
1901
        <pb n="218" />
        : THEORY OF STATISTICS.

2. Mlustration i.—It is required to throw some light on the
variations of pauperism in the unions (unions of parishes) of
England. (Cf. Yule, ref. 2.)

One table (Table VIII.) bearing on a part of this question, viz.
the influence of the giving of out-relief on the proportion of the
aged in receipt of relief, was given in Chap. IX. (p. 183). The
question was treated by correlating the percentage of the aged
relieved in different districts with the ratio of numbers relieved
outdoors to the numbers in the workhouse. Is such a method
the best possible ?

On the whole, it would seem better to correlate changes in
pauperism with changes in various possible factors. If we say
that a high rate of pauperism in some district is due to lax
administration, we presumably mean that as administration
became lax, pauperism rose; or that if administration were more
strict, pauperism would decrease ; if we say that the high pauper-
ism is due to the depressed condition of industry, we mean that
when industry recovers, pauperism will fall. When we say, in
fact, that any one variable is a factor of pauperism, we mean
that changes in that variable are accompanied by changes in the
percentage of the population in receipt of relief, either in the
same or the reverse direction. It will be better, therefore, to
deal with changes in pauperism and possible factors. The next
question is what factors to choose.

3. The possible factors may be grouped under three heads : —

(a) Administration.—Changes in the method or strictness of
administration of the law.

(6) Environment.— Changes in economic conditions (wages,
prices, employment), social conditions (residential or industrial
character of the district, density of population, nationality of
population), or moral conditions (as illustrated, e.g., by the statis-
tics of crime).

(c) Age Distribution.—the percentage of the population between
given age-limits in receipt of relief increases very rapidly with old
age, the actual figures given by one of the only two then existing
returns of the age of paupers being—2 per cent. under age 16,
1 per cent. over 16 but under 65, 20 per cent. over 65. (Return
36, 1890.)

It is practically impossible to deal with more than three factors,
one from each of the above groups, or four variables alto-
gether, including the pauperism itself. What shall we take, then,
as representative variables, and how shall we best measure
“ pauperism ” }

4. Pauperism.—The returns give (a) cost, (6) numbers relieved.
It seems better to deal with (8) (as in the illustration of Table

192
        <pb n="219" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 193
VIII, Chap. IX.), as numbers are more important than cost from
the standpoint of the moral effect of relief on the population.
The returns, however, generally include both lunatics and vagrants
in the totals of persons relieved ; and as the administrative methods
of dealing with these two classes differ entirely from the methods
applicable to ordinary pauperism, it seems better to alter the
official total by excluding them. Returns are available giving
the numbers in receipt of relief on 1st January and 1st J uly ;
there does not seem to be any special reason for taking the one
return rather than the other, but the return for 1st January was
actually used. The percentage of the population in receipt of
relief on 1st January 1871, 1881, and 1891 (the three census
years), less lunatics and vagrants, was therefore tabulated for each
union. (The investigation was carried out in 1898.)
5. ddministration.—The most important point here, and one
that lends itself readily to statistical treatment, is the relative
proportion of indoor and outdoor relief (relief in the workhouse
and relief in the applicant’s home). The first question is,
again, shall we measure this proportion by cost or by numbers?
The latter seems, as before, the simpler and more important ratio
for the present purpose, though some writers have preferred the
statement in terms of expenditure (e.g. Mr Charles Booth, Aged
Poor—Condition, 1894). If we decide on the statement in terms
of numbers, we still have the choice of expressing the proportion (1)
as the ratio of numbers given out-relief to numbers in the work-
house, or (2) as the percentage of numbers given out-relief on
the total number relieved. The former method was chosen,
partly on the simple ground that it had already been used in an
earlier investigation, partly on the ground that the use of the
ratio separates the higher proportions of out-relief more clearly
from each other, and these differences seem to have significance.
Thus a union with a ratio of 15 outdoor paupers to one indoor
seems to be materially different from one with a ratio of, say, 10
to 1; but if we take, instead of the ratios, the percentages of
outdoor to total paupers, the figures are 94 per cent. and 91 per
cent. respectively, which are so close that they will probably fall
into the same array. The ratio of numbers in receipt of outdoor
relief to the numbers in the workhouse, in every union, was
therefore tabulated for 1st January in the census years 1871, 1881,
1891.

6. Environment.—This is the most difficult factor of all to deal
with. In Mr Booth’s work the factors tabulated were (1) persons
per acre ; (2) percentage of population living two or more to a
room, .e. “overcrowding”; (3) rateable value per head (Aged Poor—
Condition). The data relating to overcrowding were first collected
13
        <pb n="220" />
        194 THEORY OF STATISTICS.

at the census of 1891, and are not available for earlier years.
Some trial was made of rateable value per head, but with not
very satisfactory results. For any given year, and for a group of
unions of somewhat similar character, e.g. rural, the rateable value
per head appears to be highly (negatively) correlated with the
pauperism, but changes in the two are not very highly correlated :
probably the movements of assessments are sluggish and irregular,
especially in the case of falling assessments in rural unions, and
do not correspond at all accurately with the real changes in the
value of agricultural land. After some consideration, it was
decided to use a very simple index to the changing fortunes of a
district, viz. the movement of the population itself. If the
population of a district is increasing at a rate above the average,
this is primd facie evidence that its industries are prospering ; if
the population is decreasing, or not increasing as fast as the
average, this strongly suggests that the industries are suffering
from a temporary lack of prosperity or permanent decay. The
population of every union was therefore tabulated for the censuses
of 1871, 1881, 1891.

7. Age Distribution.—As already stated, the figures that are
known clearly indicate a very rapid rise of the percentage relieved
after 65 years of age. The percentage of the population over 65
years of age was therefore worked out for every union and tabu-
lated from the same three censuses. This is not, of course,
at all a complete index to the composition of the population as
affecting the rate of pauperism, which is sensibly dependent on
the proportion of the two sexes, and the numbers of children as
well. As the percentage in receipt of relief was, however, 20 per
cent. for those over 65, and only 1-2 per cent. for those under that
age, it is evidently a most important index. (A more complete
method might have been used by correcting the observed rate of
pauperism to the basis of a standard population with given num-
bers of each age and sex. (Cf. below, Chap. XI. pp. 223-25.)

8. The changes in each of the four quantities that had been
tabulated for every union were then measured by working out the
ratios for the intercensal decades 1871-81 and 1881-91, taking
the value in the earlier year as 100 in each case. The percentage
ratios so obtained were taken as the four variables. Further, as
the conditions are and were very different for rural and for urban
unions, it seemed very desirable to separate the unions into groups
according to their character. But this cannot be done with any
exactness: the majority of unions are of a mixed character, con-
sisting, say, of a small town with a considerable extent of the
surrounding country. It might seem best to base the classification
on returns of occupations, e.g. the proportions of the population
        <pb n="221" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 195
engaged in agriculture, but the statistics of occupations are not
given in the census for individual unions. Finally, it was decided
to use a classification by density of population, the grouping used
being—Rural, 0-3 person per acre or less: Mixed, more than
0°3 but not more than 1 person peracre: Urban, more than 1 person
per acre. The metropolitan unions were also treated by them-
selves. The limit 03 for rural unions was suggested by the
density of those agricultural unions the conditions in which
were investigated by the Labour Commission (the unions of
Table VII, Chap. IX.): the average density of these was 0-25,
and 34 of the 38 were under 0:3. The lower limit of density for
urban unions—1 per acre—was suggested by a grouping of Mr
Booth’s (group xiv.) : of course 1 person per acre is not a density
associated with an urban district in the ordinary sense of the
term, but a country district cannot reach this density unless it
include a small town or portion of a town, ze. unless a large
proportion of its inhabitants live under urban conditions.

The method by which the relations between four variables are
discussed is fully described in Chapter XII. : at the present stage
it can only be stated that the discussion is based on the correlations
between all the possible (6) pairs that can be formed from the four
variables.

9. lustration ii, —The subject of investigation is the inheritance
of fertility in man. (Cf. Pearson and others, ref. 3.) One table,
from the memoir cited, was given as an example in the last chapter
(Table IV.).

Fertility in man (i.e. the number of children born to a given pair)
is very largely influenced by the age of husband and wife at
marriage (especially the latter), and by the duration of marriage.
It is desired to find whether it is also influenced by the heritable
constitution of the parents, 7.e. whether, allowance being made for
the effect of such disturbing causes as age and duration of marriage,
fertility is itself a heritable character.

The effect of duration of marriage may be largely eliminated
by excluding all marriages which have not lasted, say, 15 years
at least. This will rather heavily reduce the number of records
available, but will leave a sufficient number for discussion. It
would be desirable to eliminate the effect of late marriages in
the same way by excluding all cases in which, say, husband was

over 30 years of age or wife over 25 (or even less) at the time
of marriage. But, unfortunately, this is impossible ; the age of
the wife—the most important factor—is only exceptionally given
in peerages, family histories, and similar works, from which the
data must be compiled. All marriages must therefore be
included, whatever the age of the parents at marriage, and the
        <pb n="222" />
        Te THEORY OF STATISTICS.
effect of the varying age at marriage must be estimated
afterwards.

10. But the correlation between (1) number of children of a
woman and (2) number of children of her daughter will be further
affected according as we include in the record all her available
daughters or only one. Suppose, e.g., the number of children in
the first generation is 5 (say the mother and her brothers and
sisters), and that she has three daughters with 0, 2, and 4
children respectively: are we to enter all three pairs (5, 0),
(5, 2), (5, 4) in the correlation-table, or only one pair? If the
latter, which pair? For theoretical simplicity the second process
is distinctly the best (though it still further limits the available
data). Ifit be adopted, some regular rule will have to be made
for the selection of the daughter whose fertility shall be entered
in the table, so as to avoid bias: the first daughter married
for whom data are given, and who fulfils the conditions as to
duration of marriage, may, for instance, be taken in every case.
(For a much more detailed discussion of the problem, and the
allied problems regarding the inheritance of fertility in the horse,
the student is referred to the original.)

11. Illustration iii. —The subject for investigation is the
relation between the bulk of a crop (wheat and other cereals,
turnips and other root crops, hay, etc.), and the weather. (Cf.
Hooker, ref. 7.)

Produce-statistics for the more important crops of Great
Britain have been issued by the Board of Agriculture since
1885: the figures are based on estimates of the yield furnished
by official local estimators all over the country. Estimates are
published for separate counties and for groups of counties
(divisions). But the climatic conditions vary so much over the
United Kingdom that it is better to deal with a smaller area,
more homogeneous from the meteorological standpoint. On the
other hand, the area should not be too small; it should be large
enough to present a representative variety of soil. The group
of eastern counties, consisting of Lincoln, Hunts, Cambridge,
Norfolk, Suffolk, Essex, Bedford, and Hertford, was selected as
fulfilling these conditions. The group includes the county with
the largest acreage of each of the ten crops investigated, with
the single exception of permanent grass.

12. The produce of a crop is dependent on the weather of
a long preceding period, and it is naturally desired to find the
influence of the weather at all successive stages during this
period, and to determine, for each crop, which period of the
year is of most critical importance as regards weather. It must
be remembered, however, that the times of both sowing and

196
        <pb n="223" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 197
harvest are themselves very largely dependent on the weather,
and consequently, on an average of many years, the limits of
the critical period will not be very well defined. If, therefore,
we correlate the produce of the crop (X) with the characteristics
of the weather (¥) during successive tervals of the year, it
will be as well not to make these intervals too short. It was
accordingly decided to take successive groups of 8 weeks, over-
lapping each other by 4 weeks, 7.e. weeks 1-8, 5-12, ete.
Correlation coefficients were thus obtained at 4-weeks intervals,
but based on 8 weeks’ weather.

13. It remains to be decided what characteristics of the weather
are to be taken into account. The rainfall is clearly one factor
of great importance, temperature is another, and these two will
afford quite enough labour for a first investigation. The weekly
rainfalls were averaged for eight stations within the area, and
the average taken as the first characteristic of the weather.
Temperatures were taken from the records of the same stations.
The average temperatures, however, do not give quite the sort
of information that is required: at temperatures below a certain
limit (about 42° Fahr.) there is very little growth, and the
growth increases in rapidity as the temperature rises above this
point (within limits). It was therefore decided to utilise the
figures for “accumulated temperatures above 42° Fahr.” i.e.
the total number of day-degrees above 42° during each of the
8-weekly periods, as the second characteristic of the weather ;
these “accumulated temperatures,” moreover, show much larger
variations than mean temperatures.

The student should refer to the original for the full dis-
cussion as to data. The method of treating the correlations
between three variables, based on the three possible correlations
between them, is described in Chapter XII.

14. Problems of a somewhat special kind arise when dealing
with the relations between simultaneous values of two variables
which have been observed during a considerable period of time,
for the more rapid movements will often exhibit a fairly close
consilience, while the slower changes show no similarity. The two
following examples will serve as illustrations of two methods which
are generally applicable to such cases.

Hlustration iv.—Fig. 41 exhibits the movements of (1) the
infantile mortality (deaths of infants under 1 year of age per 1000
births in the same year) ; (2) the general mortality (deaths at all
ages per 1000 living) in England and Wales during the period
1838-1904. A very cursory inspection of the figure shows that
when the infantile mortality rose from one year to the next
the general mortality also rose, as a rule; and similarly, when the
        <pb n="224" />
        THEORY OF STATISTICS.
infantile mortality fell, the general mortality also fell. There
were, in fact, only five or six exceptions to this rule during the
whole period under review. The correlation between the annual
values of the two mortalities would nevertheless not be very high,
as the general mortality has been falling more or less steadily since
1875 or thereabouts, while the infantile mortality attained almost
a record value in 1899. During a long period of time the correla-
tion between annual values may, indeed, very well vanish, for the
two mortalities are affected by causes which are to a large extent
different in the two cases. To exhibit, therefore, the closeness of the
relation between infantile and general mortality, for such causes
as show marked changes between one year and the next, it will be
best to proceed by correlating the annual changes, and not the annual
values. The work would be arranged in the following form (only
suflicient years being given to exhibit the principle of the process),
and the correlation worked out between the figures of cols. 3 and 5.
L. Z, 3. A OD.
Infantile Increase or General Increase or
Year. Mortality per Decrease from Mortality per Decrease from
1000 Births. Year before. 1000 living. Year before.

1838 150 &gt; 29-4 -

1839 151 -8 218 -06

1840 154 +3 : 229 +11

1841 145 -9 216 -1-3

1842 152 | +7 217 +0°1

1843 150 -2 21-2 -0'5

For the period to which the diagram refers, viz. 1838-1904, the

following constants were found by this method :—
Infantile mortality, mean annual change — 0-21
standard deviation 9:63
General mortality, mean annual change — 0-09
standard deviation 1°14
Coefficient of correlation + 0-77.

This is a much higher correlation than would arise from the
mere fact that the deaths of infants form part of the general
mortality, and consequently there must be a high correlation
between the annual changes in the mortality of those who are over
and under 1 year of age. (Cf. Exercises 7 and 8, Chap. XI.)

This method, which appears to have been first used by Miss
Cave and by Mr Hooker independently in the papers cited in
refs. 4 and G6, has recently been generalised by “Student” and
the theory fully developed by O. Anderson (cf. refs. 13, 14, 15).
By taking the first differences the influence of the slower changes
of the two variables with time may not be wholly eliminated,
but this elimination may be mniore completely effected by pro-

198
        <pb n="225" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 199
ceeding to the second differences, t.e. by working out the successive
differences of the differences in col. 3 and in col. 5 before corre-

. 0 ED ry - - pz \
200, *%° g
rc
: :
-. 8
,
oN - f
§
- = v _ eo ta A vot
Years t.
Fig. 41.—Infantile and General Mortality in England and Wales, 1838-1904.
lating. It may even be desirable to proceed to third, fourth or
higher differences before correlating.
"
7 linet
1. 30.
L
iv -—
3 3” -
rs -
3
a —_— tpi]. “ot
1855 6. ( ‘ou io bv bo &lt;7 £5 1900 05 ~
Fic. 42. —Marriage-rate and Foreign Trade, England and Wales, 1855-1604.
15. Illustration v.—The two curves of fig. 42 show (1) the
marriage-rate (persons married per 1000 of the population) for
England and Wales ; (2) the values of exports and imports per
head of the population of the United Kingdom for every year
from 1855 to 1904. Inspection of the diagram suggests a similar
relation to that of the last example, the one variable showing a
        <pb n="226" />
        209 THEORY OF STATISTICS.
rise from one year to the next when the other rises, and a fall
when the other falls. The movement of both variables is, how-
ever, of a much more regular kind than that of mortality,
resembling a series of “waves” superposed on a steady general
trend, and it is the “waves” in the two variables—the short-period
movements, not the slower trends—which are so clearly related.

16. It is not difficult, moreover, to separate the short-period
oscillations, more or less approximately, from the slower movement.
Suppose the marriage-rate for each year replaced by the average
of an odd number of years of which it is the centre, the number
being as near as may be the same as the period of the “waves ”—
e.g. nine years. If these short-period averages were plotted on
the diagram instead of the rates of the individual years, we should
evidently obtain a smoother curve which would clearly exhibit
the trend and be practically free from the conspicuous waves.
The excess or defect of each annual rate above or below the
trend, if plotted separately, would therefore give the * waves?”
apart from the slower changes. The figures for foreign trade
may be treated in the same way as the marriagerate, and we
can accordingly work out the correlation between the waves or
rapid fluctuations, undisturbed by the movements of longer period,
however great they may be. The arithmetic may be carried out
in the form of the following table, and the correlation worked out
in the ordinary way between the figures of columns 4 and 7.

L 2. 3. 4. 5. 6. .

Marriage-rate Nine Differ. Exports+Im- Nine Differ-
. d Years’ ts, £ Years’

Zan lf Uridine Rei snr Ye GEES

1855 16-2 — - 9:36 - —

1856 167 — - 11-14 — -_

1857 16' = 4 1185 — Z

1858 160 » 10-73 — LY

1859 17:0 16'6 +0'5 11-72 12°15 -043

1860 1:10 1 116 +05 13:03 12°94 +009

1861 16:3 167 “ut 13:01 13-52 -051

1862 1610  l163 — 1340 14°17 -077

1863 16:8 169 : 1513 14:31 40°32

1864 172 = 1643 g

1865 175 8 16°37 -

1866 176 - 1772 -

1867 165 : 16°47 -

17. Fig. 43 is drawn from the figures of columns 4 and 7, and
shows very well how closely the oscillations of the marriage-rate
are related to those of trade. For the period 1861-95 the
correlation between the two oscillations (Hooker, ref. 5) is 0-86.
The method may obviously be extended by correlating the devia
        <pb n="227" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 201
tion of the marriage-rate in any one year with the deviation of
the exports and imports of the year before, or two years before,
instead of the same year; if a sufficient number of years be
taken, an estimate may be made, by interpolation, of the time-
difference that would make the correlation a maximum if it were
possible to obtain the figures for exports and imports for periods
other than calendar years. Thus Mr Hooker finds (ref. 5) that
on an average of the years 1861-95 the correlation would be a
maximum between the marriage-rate and the foreign trade of
about one-third of a year earlier. The method is an extremely
useful one, and is obviously applicable to any similar case. The
2)

| $ ‘
1860 65 os 7 Ce 85 90 95
Fic. 43.—Fluctuations in (1) Marriage-rate and (2) Foreign Trade (Exports
+ Imports per head) in England and Wales : the Curves show Deviations
from 9-year means. Data of R. H. Hooker, Jour. Roy. Stat. Soc., 1901.
student should refer to the paper by Mr Hooker, cited. Reference
may also be made to ref. 10, in which several diagrams are given
similar to fig. 43, and the nature of the relationship between the
marriage-rate and such factors as trade, unemployment, ete., is
discussed, it being suggested that the relation is even more
complex than appears from the above. The same method of
separating the short-period oscillations was used at an earlier
date by Poynting in ref. 16, to which the student is referred
for a discussion of the method.

18. It was briefly mentioned in § 9 of the last chapter that
the treatment of cases when the regression was non-linear was,
in general, somewhat difficult. Such cases lie strictly outside
the scope of the present volume, but it may be pointed out
that if a relation between X and ¥ be suggested, either by
        <pb n="228" />
        . THEORY OF STATISTICS.
theory or by previous experience, it may be possible to throw
that relation into the form

Y=24 + B.¢(X),
where 4 and B are the only unknown constants to be determined.
If a correlation-table be then drawn up between ¥ and ¢(X)
instead of ¥ and JX, the regression will be approximately linear.
Thus in Table V. of the last chapter, if X be the rate of
discount and Y the percentage of reserves on deposits, a
diagram of the curves of regression, or curves on which the
means of arrays lie, suggests that the relation between X and Y
is approximately of the form

X(Y -B)=4,
4 and B being constants ; that is,

XY=4+ BX.
Or, if we make XY a new variable, say Z,

Z=4+ BX.

Hence, if we draw up a new correlation-table between X and Z
the regression will probably be much more closely linear.

If the relation between the variables be of the form

Y= ADZ
we have
log Y=1og 4 + X. log B,
and hence the relation between log ¥ and X is linear. Similarly,
if the relation be of the form
X¥=4
we have
log Y=1log 4 —n. log X,
and so the relation between log Y¥ and log X is linear By
means of such artifices for obtaining correlation-tables in
which the regression is linear, it may be possible to do a good
deal in difficult cases whilst using elementary methods only.
The advanced student should refer to ref. 17 for a different
method of treatment.

19. The only strict method of calculating the correlation
coefficient is that described in Chapter IX. from the formula
po SOIL Approximations to this value may, however, be

N.c,0,

202
        <pb n="229" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 203
found in various ways, for the most part dependent either (1)
on the formule for the two regressions rot and vt, or (2) on

v z

the formule for the standard deviations of the arrays o, V1 - 12
and 0, v/T—72. Such approximate methods are not recommended
for ordinary use, as they will lead to different results in different
hands, but a few may be given here, as being occasionally useful
for estimating the value of the correlation in cases where the
data are not given in such a shape as to permit of the proper
calculation of the coefficient.

(1) The means of rows.and columns are plotted on a diagram,
and lines fitted to the points by eye, say by shifting about
a stretched black thread until it seems to run as near as may
be to all the points. If ;, 4, be the slopes of these two lines
to the vertical and the horizontal respectively,

r= A/b,.5,.

Hence the value of » may be estimated from any such diagram
as figs. 36-40 in Chapter IX., in the absence of the original
table. Further, if a correlation-table be not grouped by
equal intervals, it may be difficult to calculate the product
sum, but it may still be possible to plot approximately a diagram
of the two lines of regression, and so determine roughly the
value of 7. Similarly, if only the means of two rows and
two columns, or of one row and one column in addition to the
means of the two variables, are known, it will still be possible
to estimate the slopes of RR and CC, and hence the correlation
coefficient.

(2) The means of one set of arrays only, say the rows, are
calculated, and also the two standard-deviations 0, and o,. The
means are then plotted on a diagram, using the standard-deviation
of each variable as the unit of measurement, and a line fitted by
eye. The slope of this line to the vertical is ». If the standard
deviations be not used as the units of measurement in plotting,
the slope of the line to the vertical is » o./o,, and hence » will be
obtained by dividing the slope by the ratio of the standard-

deviations,

This method, or some variation of it, is often useful as a
makeshift when the data are too incomplete to permit of the
proper calculation of the correlation, only one line of regression
and the ratio of the dispersions of the two variables being required :
the ratio of the quartile deviations, or other simple measures of
dispersion, will serve quite well for rough purposes in lieu of the
ratio of standard-deviations. As a special case, we may note that
        <pb n="230" />
        204 THEORY OF STATISTICS.
if the two dispersions are approximately the same, the slope of
RR to the vertical is 7.

Plotting the medians of arrays on a diagram with the quartile
deviations as units, and measuring the slope of the line, was the
method of determining the correlation coefficient (‘‘Galton’s
function ”) used by Sir Francis Galton, to whom the introduction
of such a coeflicient is due. (Refs. 2-4 of Chap. IX. p. 188.)

(3) If s, be the standard deviation of errors of estimate like
x — b,.y, we have from Chap. IX. § 11—

8a = g2{L jE v2),
and hence
3,2
r= —_ a
But if the dispersions of arrays do not differ largely, and the
regression is nearly linear, the value of s, may be estimated from
the average of the standard-deviations of a few rows, and r deter-
mined—or rather estimated—accordingly. Thus in Table III,
Chap. IX., the standard-deviations of the ten columns headed
625-635, 63:5-64-5, etc., are—
2:56 2-26
2-11 2-26
2:55 2-45
2:24 2:33
2:23 - =
2-60 Mean 2-359
The standard-deviation of the stature of all sons is 2:75: hence
approximately
ni
ge “\ 275
=0514.
This is the same as the value found by the product-sum method
to the second decimal place. It would be better to take an
average by counting the square of each standard-deviation
once for each observation in the column (or * weighting”
it with the number of observations in the column), but in the
present case this would only lead to a very slightly different
result, viz. o=2'362, r=0'512.

20. The Correlation Ratio.—The method clearly would not
give an approximation to the correlation coefficient, however, in
the case of such tables as V. and VI. of Chap. IX., in which the
means of successive arrays do not lie closely round straight lines.
        <pb n="231" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. 205
In suzh cases it would always tend to give a value for r markedly
higher than that given by the product-sum method. The
product-sum method gives in fact a value based on the standard-
deviation round the line of regression; the method used above
gives a value dependent on the standard-deviation round a line
which sweeps through all the means of arrays, and the second
standard-deviation is necessarily less than the first. We reach,
therefore, a generalised coefficient which measures the approach
towards a curvilinear line of regression of any form.

Let s,, denote the standard-deviation of any array of X’s, and
let , as before, be the number of observations in this array (Chap.
IX., § 11), and further let

Cl=3(n-5. 0/8 . &lt;A)
Then o,, is an average of the standard-deviations of the arrays
obtained as suggested at the end of the last section. Now let
ru = 51 = 1.) _
2_ 1% 3
Noy" Fi (3)
O,
Then 7,, is termed by Professor Pearson a correlation-ratio (ref.
18). As there are clearly two correlation-ratios for any one table,
it should be distinguished as the correlation-ratio of X on Y: it
measures the approach of values of X associated with given
values of Y to a single-valued relationship of any form. The
calculation would be exceedingly laborious if we had actually to
evaluate o,, but this may be avoided and the work greatly
simplified by the following consideration. If JZ, denote the mean
of all Xs, m, the mean of an array, then we have by the general
relation given in § 11 of Chap. VIII (p. 142)
No? =3n(s.* + [M, — m,]2).
Or, using o,,. to denote the standard-deviation of m, ,
ok = Coe + Tid (4)
Hence, substituting in (3)
a
= Tn 6
Nzy ee (5)
The correlation-ratio of X on ¥ is therefore determined when we
have found, in addition to the standard-deviation of X, the
standard-deviation of the means of its arrays.
21. The correlation-ratio of X on ¥ cannot be less than the
correlation-coefficient for X and Y, and 5,2 -7? is a measure of
the divergence of the regression of X on Y from linearity. For

or
        <pb n="232" />
        THEORY OF STATISTICS.

if d denote, as in Chap. IX., the deviation of the mean of an
array of X’s from the line of regression, we have by the relation
of Chap. IX,, § 11, p. 172

a. (1 LL, 72) =, + 02. (6)
Substituting for o,, from (2), that is,

of =o =v . (7)

But ¢, is necessarily positive, and therefore 7,, is not less than 7.
The magnitude of o, and therefore of %2—72 measures the
divergence of the actual line through the means of arrays from
the line of regression.

It should be noted that, owing to the fluctuations of sampling,
r and 7 are almost certain to differ slightly, even though the
regression may be truly linear. The observed value of #%- 7?
must be compared with the values that may arise owing to
fluctuations of sampling alone, before a definite significance can
be ascribed to it (cf. Pearson, ref. 19, Blakeman, ref. 22, and the
formule cited therefrom on p. 352 below).

22. The following table illustrates the form of the arithmetic
for the calculation of the correlation-ratio of son’s stature on
father’s stature (Table III. of Chap. IX. p. 160). In the first
column is given the type of the array (stature of father); in the
second, the mean stature of sons for that array; in the third, the
difference of the mean of the array from the mean stature of all
sons. In the fourth column these differences are squared, and in
the sixth they are multiplied by the frequency of the array, two
decimal places only having been retained as sufficient for the
present purpose. The sum-total of the last column divided by
the number of observations (1078) gives o,,,2 = 2058, or ,,, = 143.
As the standard-deviation of the sons’ stature is 2:75 in. (cf.
Chap. IX., question 3), 7,,=052. Before taking the differences
for the third column of such a table, it is as well to check the
means of the arrays by recalculating from them the mean of the
whole distribution, ¢.e. multiplying each array-mean by its fre-
quency, summing, and dividing by the number of observations.
The form of the arithmetic may be varied, if desired, by working
from zero as origin, instead of taking differences from the true
mean. The square of the mean must then be subtracted from
2(f-m2)/N to give o,,,2%

If the second correlation-ratio for this table be worked outrin
the same way, the value will be found to be the same to the
second place of decimals: the two correlation-ratios for this table
are, therefore, very nearly identical, and only slightly greater
than the correlation-coefficient (0:51). Both regressions, it

206
        <pb n="233" />
        X.—CORRELATION : ILLUSTRATIONS AND METHODS. - 207
follows from the last section, are very nearly linea a resu oth ek ‘
confirmed by the diagram of the regression lines (fig. 37, po Bh z
On the other hand, it is evident from fig. 39, p. 176, that we =
should expect the two correlation-ratios for Table VI. of the same |. \ * ;
chapter to differ considerably from each other and from the ¢ x He
tion. The values found are Ny =0'14, ,=038 (r= -0014)&gt;
7 18 comparatively low as proportions of male births differ little
in the successive arrays, but »,, is higher since the line of regres-
sion of ¥ on X is sharply curved. For Table VIII, p. 183, the
two ratios are 7,, = 046, 5, =039 (r=0'34). The confirmation
of these values is left to the student,

The student should notice that the correlation-ratio only
affords a satisfactory test when the number of observations is
sufficiently large for a grouped correlation table to be formed.

In the case of a short series of observations such as that given in
Table VII., p. 178, the method is inapplicable.
CALCULATION oF THE CORRELATION-RATIO : Erample.—Son’s Stature on
Father's Stature : Data of Table II1., Chap. 1X., p. 160.
1. 2. 3. 0.
Type of Mean of LX erie
Ira Arra "
Father's | Sond ofall Som Dofeeener Frequency, Frequency x
Stature). Stature). (68-66).
59 64°67 -3+00 15-9201 : 47°76
60 , 6564 - 3-02 91204 35 31-92
61 | 66-34 - 2-32 53824 8 43-06
62 65°56 -3°10 96100 17 163-37
63 66°68 -1-98 39204 335 13133
66-74 - 192 36864 61-5 226-71
’ 67°19 — 1-47 2-1609 95-5 206-37

{ 67°61 ~1+05 11025 142 156-56

: 67-95 -0-71 0-5041 1375 69-31

69°07 + 0°41 01681 154 25-89

« 69°39 +073 0-5329 1415 7541

69-74 +1-08 1'1664 116 135-30

i 70°50 +184 33856 78 264-08

i 70-87 42-21 4-8841 49 239-32

i 72:00 +3734 11-1556 285 317-93

‘ 7150 +284 80656 : 32°26

i- 7173 +307 9+4249 50 51-84

Total : 1u/8 2218442

omy’ =2218'42/1078 =2058 omy=143
Nyz=143/2'75=0"52.

0.
        <pb n="234" />
        THEORY OF STATISTICS.
REFERENCES.

Illustrative Applications, principally to Economic Statistics,

and Practical Methods.

(1) Yue, G. U., ‘‘ On the Correlation of total Pauperism with Proportion ot
Out-relief,” Economic Jour., vol. v., 1895, p. 603, and vol. vi., 1896,
p. 613.

(2) YuLg, G. U., “An Investigation into the Causes of Changes in Pauperism

: in England chiefly during the last two Intercensal Decades,” Jour.
Roy. Stat. Soec., vol. Ixii., 1899, p. 249. (Cf. Illustration i.)

(3) PEARSON, KARL, ArLicE Lrg, and L. BRAMLEY MooRrE, ‘‘Genetic
(reproductive) Selection: Inheritance of Fertility in Man and of
Fecundity in Thoroughbred Racehorses,” Phil. Trans. Roy. Soc., Series
A, vol. cxcii., 1899, p. 257. (Cf. lllustration ii.)

(4) CAVE-BROWNE-CAVE, F. E., “On the Influence of the Time-factor on
the Correlation between the Barometric Heights at Stations more than
1000 miles apart,” Proc. Roy. Soc., vol. Ixxiv., 1904, pp. 403-413.
(The difference-method of Illustration iv. used.)

(5) Hooker, R. H., “On the Correlation of the Marviage-rate with Trade,”
Jour. Roy. Stat, Soc., vol. lxiv., 1901, p. 485. (The method of
Illustration v.)

(6) Hooker, R. H., ¢ On the Correlation of Successive Observations : illus-
trated by Corn-prices,” ¢bid., vol. lxviii., 1905, p. 696. (The method
of Illustration iv.)

(7) HooxkEr, R. H., “The Correlation of the Weather and the Crops,” 4bsd.,
vol. Ixx., 1907, p. 1. (Cf. Illustration iii.)

(8) Norron, J. P., Statistical Studies in the New York Money Market;
Macmillan Co., New York, 1902. (Applications to financial statistics :
an instantaneous average;method, analogous to that of illustration v., is
employed, but the instantaneous average is obtained by an interpolated
logarithmic curve.)

(9) MarcH, L., ‘‘Comparaison numérique de courbes statistiques,” Jour.
de la soctété de statistique de Paris, 1905, pp. 255 and 306. (Uses the
methods of Illustrations iv. and v., but obtaining the instantaneous
average in the latter case by graphical interpolation.)

(10) Youre, G. U., “On the Changes in the Marriage and Birth Rates in
England and Wales during the past Half Century, with an Inquiry as
to their probable Causes,” Jour. Roy. Stat. Soc., vol. Ixix., 1906, p. 88.

(11) HEroN, D., On the Relation of Fertility in Man to Social Status,
‘* Drapers’ Co. Research Memoirs: Studies in National Deterioration,”
I. ; Dulau &amp; Co., London, 1906.

(12) Jacos, S. M., “On the Correlations of Areas of Matured Crops and the
Rainfall,” Mem. Asiatic Soc. Bengal, vol. ii., 1910, p. 847.

(18) ““ STupENT,” ¢‘ The Elimination of Spurious Correlation due to Position
in Time or Space,” Biometrika, vol. x., 1914, pp. 179-180. (The
extension of the difference-method by the use of successive differences.)

(14) ANDERSON, O., ‘‘Nochmals iiber ¢ The Elimination of Spurious Correla-
tion due to Position in Time or Space,”” Biometrika, vol. x., 1914,
pp. 269-279. (Detailed theory of the same extended method.)

(15) Cave, BEATRICE M., and KARL PEARSON, ‘‘ Numerical Illustrations of
the Variate-difference Correlation Method,” Biometrika, vol. x., 1914,
pp. 340-3855.

(16) PoynTING, J. H., ‘“ A Comparison of the Fluctuations in the Price of
Wheat, and in the Cotton and Silk Imports into Great Britain,” Jour.

208
        <pb n="235" />
        X.—CORRELATION: ILLUSTRATIONS AND METHOD. 209
Roy. Stat. Soc., vol. xlvii,, 1884, p. 34. (This paper was written
before the invention of the correlation coefficient, but is cited because
the method of Illustration v. is used to separate the periodic from the
secular movement: see especially § ix. on the process of averaging
employed.)

Theory of Correlation in the case of Non-linear Regression,
and Curve or Line fitting generally.

(17) PEARSON, KARL, “On the Systematic Fitting of Curves to Observations
and Measurements,” Biometriko, vol. i. p. 265, and vol. ii. p. 1, 1902,
(The second part is useful for the fitting of curves in cases of non-linear
regression. )

(18) Pearson, Karyn, On the @Qeneral Theory of Skew Correlation and Non-
linear Regression, ‘‘ Drapers’ Co. Research Memoirs : Biometric Series,”
11. ; Dulau &amp; Co., London, 1905. (The ‘f correlation ratio.”)

(19) Pearson, KARL, “ On a Correction to be made to the Correlation Ratio,”
Biometrika, vol. viii., 1911, p. 254.

(20) PearsoN, KArL, “On Lines and Planes of Closest Fit to Systems of
Points in Space,” Phil. Mag., 6th Series, vol. ii., 1901, p. 559,

(21) Pearson, Karn, “On a General Theory of the Method of False
Position,” Phil. Mag., June 1903. “(A method of curve fitting by
the use of trial solutions.)

(22) BLAKEMAN, J., “On Tests for Linearity of Regression in Frequency-
distributions,” Biometrika, vol. iv., 1905, p. 332.

(23) Snow, E. C., “On Restricted Lines and Planes of Closest Fit to Systems
of Points in any number of Dimensions,” Phil. Mag., 6th Series, vol.
xxi., 1911, p. 567.

(24) SvuTsky, E., “On the Criterion of Goodness of Fit of the Regression
Lines and the best Method of Fitting them to the Data,” Jour. Roy.
Stat. Soc., vol. Ixxvii., 1913, pp. 78-84.

Abbreviated Methods of Calculation.
See also references to Chapter XVI.

(25) Harris, J. AkTHUR, ‘‘ A Short Method of Calculating the Coefficient
of Correlation in the case of Integral Variates,” Biometrika, vol. vii.,
1909, p. 214. (Notan approximation, but a true short method.)

(26) HARRIS, J. ARTHUR, ‘‘On the Calculation of Intra-class and Inter-class
Coefficients of Correlation from Class-moments when the Number of
possible Combinations is large,” Biometrika, vol. ix., 1914, pp.
446-472.

| 4
        <pb n="236" />
        CHAPTER XI.
MISCELLANEOUS THEOREMS INVOLVING THE USE OF
THE CORRELATION-COEFFICIENT.

1. Introductory—2. Standard-deviation of a sum or diflerence—3-5. In-
fluence of errors of observation and of grouping on the standard-
deviation—6-7. Influence of errors of observation on the correlations
coefficient (Spearman’s theorems)—8. Mean and standard-deviation
of an index —9. Correlation between indices — 10. Correlation-
coefficient for a two- x two-fold table—11. Correlation-coefficient
for all possible pairs of IV values of a variable—12. Correlation due
to heterogeneity of material —18, Reduction of correlation due to
mingling of uncorrelated with correlated material — 14-17. The
weighted mean—18-19. Application of weighting to the correction
of death-rates, ete., for varying sex and age-distributions—20. The
weighting of forms of average other than the arithmetic mean.

L. Ir has already been pointed out that a statistical measure, if

it is to be widely useful, should lend itself readily to algebraical

treatment. The arithmetic mean and the standard-deviation
derive their importance largely from the fact that they fulfil this
requirement better than any other averages or measures of dis-
persion ; and the following illustrations, while giving a number of
results that are of value in one branch or another of statistical
work, suffice to show that the correlation-coefficient can be treated
with the same facility. This might indeed be expected, seeing
that the coefficient is derived, like the mean and standard-devia-
tion, by a straightforward process of summation.

2. To find the Standard-deviation of the sum or difference Z of

corresponding values of two variables X; and X,.

Let 2, x), #, denote deviations of the several variables from
their arithmetic means. Then if
Z=X +X,
evidently
f=; + x,
210
        <pb n="237" />
        XIL—CORRELATION : MISCELLANEOUS THEOREMS, 211
Squaring both sides of the equation and summing,
3(2) = 3(ry?) + (2) £ 25(oy2,).
That is, if » be the correlation between z; and xy, and o, oy, oy
the respective standard-deviations,
o?=0.2+ 0,2 + 2r.0y0, +: (1)
If 2; and x, are uncorrelated, we have the important special case
o?=02+ 0,2 : (2)

The student should notice that in this case the standard-
deviation of the sum of corresponding values of the two variables
is the same as the standard-deviation of their difference.

The same process will evidently give the standard-deviation of a
linear function of an number of variables. For the sum of a
series of variables XV, |. | | X, we must have

ol=0 +0 : +0 + +024 20.000, + 275.000,
BNE hE are
7, being the correlation beween X, and X,, r,, the correlation
between X, and X,, and so on.

3. Influence of Errors of Observation on the Standard-deviation.
—The results of § 2 may be applied to the theory of errors of
observation. Let us suppose that, if any value of X be observed
a large number of times, the arithmetic mean of the observations
is approximately the true value, the arithmetic mean error being
zero. Then, the arithmetic mean error being zero for all values
of X, the error, say §, is uncorrelated with X. In this case if x, be
an observed deviation from the arithmetic mean, « the true devia-
tion, we have from the preceding

0,2=0.2+057 . . . (3)
The effect of errors of observation is, consequently, to increase the
standard-deviation above its true value. The student should
notice that the assumption made does not imply the complete in-
dependence of X and 8: he is quite at liberty to suppose that
errors fluctuate more, for example, with large than with small
values of X, as might very probably happen. In that case the
contingency-coefficient between X and J would not be zero,
although the correlation-coefficient might still vanish as supposed.

4. Influence of Grouping om the Standard-deviation.—The
consequence of grouping observations to form the frequency
distribution is to introduce errors that are, in effect, errors of
        <pb n="238" />
        5 THEORY OF STATISTICS.
measurement. Instead of assigning to any observation its true
value X, we assign to it the value X, corresponding to the centre
of the class-interval, thereby making an error 3, where

xX; = X Zs 0.

To deduce from this equation a formula showing the nature of
the influence of grouping on the standard-deviation we must know
the correlation between the error 6 and X or X;. If the original
distribution were a histogram, X; and &amp; would be uncorrelated,
the mean value of 8 being zero for every value of Xj : further, the
square of the standard-deviatioh of 6 would be ¢2/12, where c is
the class-interval (Chap. VIII § 12, eqn. (10)). Hence, if 0 be the
standard-deviation of the grouped values X; and o the standard-
deviation of the true values JX,

Bam 0
go =g&lt; 15s

But the true frequency distribution is rarely or never a
histogram, and trial on any frequency distribution approximating
to the symmetrical or slightly asymmetrical forms of fig. 5, p. 89,
or fig. 9 (a), p. 92, shows that grouping tends to increase rather
than reduce the standard-deviation. If we assume, as in § 3, that
the correlation between 8 and X, instead of 6 and X, is appreciably
zero and that the standard-deviation of § may be taken as ¢?/12,
as before (the values of 8 being to a first approximation uniformly
distributed over the class-interval when all the intervals are
considered together), then we have

Sg

g 01 19 . (4)
This is a formula of correction for grouping (Sheppard’s correc-
tion, refs. 1 to 4) that is very frequently used, and that trial
(ref. 1) shows to give very good results for a curve approximating
closely to the form of fig. 5, p. 89. The strict proof of the
formula lies outside the scope of an elementary work : it is based
on two assumptions: (1) that the distribution of frequency is
continuous, (2) that the frequency tapers off gradually to zero
in both directions. The formula would not give accurate results
in the case of such a distribution as that of fig. 9 (8), p. 92, or
fig. 14, p. 97, neither is it applicable at all to the more divergent
forms such as those of figs. 15, et seq.

5. If certain observations be repeated so that we have in every
case two measures 2; and x, of the same deviation #, it is possible
to obtain the true standard-deviation o, if the further assumption
is legitimate that the errors 6, and §, are uncorrelated with each
other. On this assumption

2192
        <pb n="239" />
        XI.—CORRELATION: MISCELLANEOUS THEOREMS. 213
(wry) = 3(z + 6,)(x + 3,)

= 2(7),

and accordingly
3(ayz,) =

Shel 7 (5)
(This formula is part of Spearman’s formula for the correction of
the correlation-coefficient, ¢f. § 7.)

6. Influence of Errors of Observation on the Correlation-coefficient.
—Let x, v, be the observed deviations from the arithmetic means,
x, y the true deviations, and 8, € the errors of observation. Of
the four quantities x, 7, 8, ¢ we will suppose # and y alone to
be correlated. On this assumption

X(xyyy) = 2(xy) -
It follows at once that
Try _ Ta Ty,
Fay, T Or Oy :
and consequently the observed correlation is less than the true
correlation. This difference, it should be noticed, no mere increase
in the number of observations can in any way lessen.

7. Spearman’s Theorems.—If, however, the observations of both
x and 7 be repeated, as assumed in § 5, so that we have two
measures x, and x,, ¥; and y, of every value of # and y, the true
value of the correlation can be obtained by the use of equations
(5) and (6), on assumptions similar to those made above. For
we have

= 3 (@1) (075) = 3 (ayy) (yyy)
(0129) 2(y170) (2129) 2(117)
= Youn Taw = Tay Ta, g (7)
Torr Tywe Tozer Tvis
Or, if we use all the four possible correlations between observed
values of x and observed values of 7,
Tenet ogeayet
r 4 aN ZV WT 8
= CA ( )

Equation (8) is the original form in which Spearman gave his
correction formula (refs. 6, 7). It will be seen to imply the
assumption that, of the six quantities , y, 8,, 8, €, €, # and ¥
alone are correlated. The correction given by the second part
of equation (7), also suggested by Spearman, seems, on the

(6)
        <pb n="240" />
        214 THEORY OF STATISTICS.

whole, to be safer, for it eliminates the assumption that the errors
in # and in y, in the same series of observations, are uncorrelated.
An insufficient though partial test of the correctness of the
assumptions may be made by correlating #, — 2, with ¥1—¥,: this
correlation should vanish. Evidently, however, it may vanish
from symmetry without thereby implying that all the correlations
of the errors are zero.

8. Mean and Standard-deviation of an Index.—(Ref.11.) The
means and standard-deviations of non-linear functions of two or
more variables can in general only be expressed in terms of the
means and standard-deviations of the original variables to a first
approximation, on the assumption that deviations are small
compared with the mean values of the variables. Thus let it be
required to find the mean and standard-deviation of a ratio or
index Z = X,/X,, in terms of the constants for X, and X,. Let [7
be the mean of Z, M, and J, the means of X; and X,. “Then

lin 2) zy ¥
7-533) rarx(+ 3)
Expand the second bracket by the binomial theorem, assuming
that »,/M, is so small that powers higher than the second can
be neglected. Then to this approximation
1M, 1 1 3 |
I== 77 - I) == 7 ) .
That is, if r be the correlation between x; and #,, and if v, = o,/M,,
vy =0o/M,, .
Y
I= a! — 70,0, + Vy?) 9)
If s be the standard-deviation of Z we have
1_/7X\2
CL Joe Nl]
S47 7% 7)
1 M2 ( xy \? z Ne
“7A 3 03
Expanding the second bracket again by the binomial theorem,
and neglecting terms of all orders above the second,
1.0.2 Z\2 7 7,2
2 mae a iNT Sr
=i -¥ 2A] +5) (1 I, +378)
M2
= 7 1 + 2,2 — drow, + 30,2)
        <pb n="241" />
        XI.—CORRELATION : MISCELLANEOUS THEOREMS. 215
or from (9)
ov Mie,
§2 = TES = 20, + 0,2) (10)

9. Correlation between Indices—(Ref.11.) The following prob-
lem affords a further illustration of the use of the same method.
Required to find approximately the correlation between two ratios
I =2X,|X,, Zy=X,/X,, X; X, and X, being uncorrelated.

Let the means of the two ratios or indices be 7; 7, and the
standard-deviations s; s,; these are given approximately by (9)
and (10) of the last section. The required correlation p will be
given by

(X X.
¥psysy=3(y' - n)( - r,)
XZ
=x( =) “NTT
MM, ( i ( Zy X Za ¥: Ar
Neglecting terms of higher order than the second as before and
remembering that all correlations are zero, we have
MM,
psi8y=—. yr (1 +307) - 1],
Le
=3:
where, in the last step, a term of the order v,* has again been
neglected. Substituting from (10) for s; and s,, we have finally—
v2
= me {11
P= Jol + v5%)(25? + 25%) gab

This value of p is obviously positive, being equal to 05 if
vy =v, =7;; and hence even if X; and X, are independent, the in-
dices formed by taking their ratios to a common denominator X will
be correlated. The value of p is termed by Professor Pearson the
“spurious correlation.” Thus if measurements be taken, say, on
three bones of the human skeleton, and the measurements grouped
in threes absolutely at random, there will, nevertheless, be a
positive correlation, probably approaching 05, between the indices
formed by the ratios of two of the measurements to the third. To
give another illustration, if two individuals both observe the same
series of magnitudes quite independently, there may be little, if
        <pb n="242" />
        216 THEORY OF STATISTICS.

any, correlation between their absolute errors. But if the errors
be expressed as percentages of the magnitude observed, there
may be considerable correlation. It does not follow of necessity
that the correlations between indices or ratios are misleading.
If the indices are uncorrelated, there will be a similar spurious ”
correlation between the absolute measurements Z;.X,=X, and
ZyX,=X,, and the answer to the question whether the correlation
between indices or that between absolute measures is misleading
depends on the further question whether the indices or the
absolute measures are the quantities directly determined by the
causes under investigation (cf. ref. 13).

The case considered, where X; X, X, are uncorrelated, is only
a special one; for the general discussion ¢f. ref. 11. For an in-
teresting study of actual illustrations cf. ref. 14.

10. Zhe Correlation-coefficient for a two- xX two-fold Table.—The
correlation-coefficient is in general only calculated for a table with
a considerable number of rows and columns, such as those given
in Chapter IX. In some cases, however, a theoretical value is
obtainable for the coefficient, which holds good even for the limiting
case when there are only two values possible for each variable (e.g.
0 and 1) and consequently two rows and two columns (¢f. one illus-
tration in § 11, and for others the references given in questions 11
and 12). It is therefore of some interest to obtain an expression
for the coefficient in this case in terms of the class-frequencies.

Using the notation of Chapters I.-IV. the table may be written
in the form

Values of Values of First Variable,
Second
Variable. Zn . Total
£9 (£45 le. ’
#E 1 (4R) == i;
a Bc eect cece —
Total | (4) | ., r

Taking the centre of the table as arbitrary origin and the
class-interval, as usual, as the unit, the co-ordinates of the
mean are

Zia)
E={(@) - (4)
wnt Lh
i= 5A (5)- (B)}:
        <pb n="243" />
        XIL.—CORRELATION : MISCELLANEOUS THEOREMS. 217
The standard-deviations o, a, are given by
0 2=025 - &amp;=(4)(a)/N?
02 =025 — 72 =(B)(B)/N*
Finally,
(zy) =1{(4B) + (a) - (4B) - (aB)} - N 5.
Writing
(4B) - (4)(B)/N=3
(as in Chap. III. §§ 11-12) and replacing &amp; 7 by their values,
this reduces to
(xy) =.
Whence
N.§
r= —pmm———, 12
JADED vi

This value of » can be used as a coefficient of association, but,

unlike the association-coefficient of Chap. III. § 13, which is
unity if either (4B)=(4) or (48)=(B), r only becomes unity if
AB)=(4)=(B). This is the only case in which both frequencies
iz and (4) can vanish so that (4B) and (af) correspond to
the frequencies of two points X; ¥;, X, ¥, on a line. Obviously
this alone renders the numerical values of the two coefficients
quite incomparable with each other. But further, while the
association coefficient is the same for all tables derived from one
another by multiplying rows or columns by arbitrary coefficients,
the correlation coefficient (12) is greatest when (4)=(a) and
(B)=(B), i.e. when the table is symmetrical, and its value is
lowered when the symmetrical table is rendered asymmetrical by
increasing or reducing the number of 4’s or B’s. For moderate
degrees of association, the association coefficient gives much the
larger values. The two coefficients possess, in fact, essentially
different properties, and are different measures of association in
the same sense that the geometric and arithmetic means are
different forms of average, or the interquartile range and the
standard-deviation different measures of dispersion.

The student is again referred to ref. 3 of Chap. IIL for a
general discussion of various measures of association, including
these and others, that have been proposed.

11. The Correlation-coeflicient for all possible pairs of N values
of a Variable.—In certain cases a correlation-table is formed by
combining N observations in pairs in all possible ways. If, for
example, a table is being formed to illustrate, say, the correlation
between brothers for stature, and there are three brothers in
        <pb n="244" />
        2: THEORY OF STATISTICS.
one family with statures 5 ft. 9, 5 ft. 10, and 5 ft. 11, these are
regarded as giving the six pairs

5 ft. 9 with 5 ft. 10 5 ft. 10 with 5 ft. 9

o sy, ite 1] Bub Ll :

3 16.10, n ™ » oD 16.10
which may be entered into the table. The entire table will be
formed from the aggregate of such subsidiary tables, each due to
one family. Let it be required to find the correlation-coefficient,
however, for a single subsidiary table, due to a family with &amp;
members, the numbers of pairs being therefore NV (V —1).

As each observed value of the variable occurs &amp;/ —1 times,
t.e. once in combination with every other value, the means and
standard-deviations of the totals of the correlation-table are the
same as for the original IV observations, say # and o. If 2; 2,
Zz. .. .be the observed deviations, the product sum may be
written

TZo + XX +2124 F oo a
+ Xo%; + Xg +22, +
+ Xg%y + XT + Xgl + 0.
+
=2,{3(2) — @} +2 {3(@) = 2} +2 {3(w) ag} +. LLL
= a l-pl-mlo lL = Vg,
whence, there being N(&amp; — 1) pairs,
No? 1
pita ag rn dns nl 2
For ¥=2, 3,4... . this gives the successive values of r= -1,
-1, —1.... Itis clear that the first value is right, for two
values x, x, only determine the two points (#;, x,) and (x, ,),
and the slope of the line joining them is negative.

The student should notice that a corresponding negative
association will arise between the first and second member of the
pair if all possible pairs are formed in a mixture of 4’s and a’s.
Looking at the association, in fact, from the standpoint of § 10,
the equation (13) still holds, even if the variables can only assume
two values, e.g. 0 and 1. This result is utilised in § 14 of Chapter
X1Y.

12. Correlation due to Heterogeneity of Material.—The following
theorem offers some analogy with the theorem of Chap. IV.
§ 6 for attributes.—If X and Y are uncorrelated vn each of two
records, they will nevertheless exhibit some correlation when the

18
        <pb n="245" />
        XI.—CORRELATION : MISCELLANEOUS THEOREMS. 219
two records are mingled, unless the mean value of X in the
second record is identical with that in the first record, or the mean
value of Y in the second record is identical with that in the first
record, or both.

This follows almost at once, for if M,, M, are the mean values of
X in the two records K,, K, the mean values of ¥, N,, N, the
numbers of observations, and M, KX the means when the two
records are mingled, the product-sum of deviations about 27, X is
Ny (My — M)(K, - K) + N(M,- M)(K,- K).

Evidently the first term can only be zero if M=M or K=K,.
But the first condition gives

NM +N, M,

is St = M. »
that is, A =f,
Similarly, the second condition gives K,=K, Both the first
and second terms can, therefore, only vanish if A=, or
K,=K, Correlation may accordingly be created by the mingling
of two records in which X and ¥ vary round different means.
(For a more general form of the theorem cf. ref. 20)

13. Reduction of Correlation due to mingling of uncorrelated
with correlated pairs.—Suppose that n, observations of z and y
give a correlation-coefficient

x /
oe ey)
n,o,0,
Now let m, pairs be added to the material, the means and
standard-deviations of x and » being the same as in the first
series of observations, but the correlation zero. The value of
3(xy) will then be unaltered, and we will have
r= xy)
(ng +ny)o,0,
Whence a P_, (14)
ry nt
Suppose, for example, that a number of bones of the human
skeleton have been disinterred during some excavations, and
a correlation 7, is observed between pairs of bones presumed
to come from the same skeleton, this correlation being rather
lower than might have been expected, and subject to some
uncertainty owing to doubts as to the allocation of certain
bones. If 7, is the value that would be expected from other
records, the difference might be accounted for on the hypothesis
        <pb n="246" />
        9 THEORY OF STATISTICS.

that, in a proportion (7, —r,)/r; of all the pairs, the bones do
not really belong to the same skeleton, and have been virtually
paired at random. (For a more general form of the theorem cf.
again ref. 20.)

14. The Weighted Mean.—The arithmetic mean M of a series
of values of a variable X was defined as the quotient of the sum
of those values by their number #, or

M=3(X)/N.

If, on the other hand, we multiply each several observed
value of X by some numerical coefficient or weight W, the
quotient of the sum of such products by the sum of the weights
is defined as a weighted mean of X, and may be denoted by 4",
so that

M=3(W.X)/3(D).

The distinction between «weighted ” and ““ unweighted ” means
is, it should be noted, very.often formal rather than essential,
for the “weights” may be regarded as actual, estimated, or
virtual frequencies. The weighted mean then becomes simply
an arithmetic mean, in which some new quantity is regarded
as the unit. Thus if we are given the means M;, My, MU, . . . .
M, of r series of observations, but do not know the number
of observations in every series, we may form a general average
by taking the arithmetic mean of all the means, viz. 2(J/)/r,
treating the series as the unit. But if we know the number
of observations in every series it will be better to form the
weighted mean Z(NM)/S(N), weighting each mean in proportion
to the number of observations in the series on which it is based.
The second form of average would be quite correctly spoken
of as a weighted mean of the means of the several series: at
the same time it is simply the arithmetic mean of all the
series pooled together, z.e. the arithmetic mean obtained by
treating the observation and not the series as the unit.
(Chap. VIL § 13.)

15. To give an arithmetical illustration, if a commodity is sold
at different prices in different markets, it will be better to form
an average price, not by taking the arithmetic mean of the several
market prices, treating the market as the unit, but by weighting
each price in proportion to the quantity sold at that price, if
known, 7.e. treating the unit of quantity as the unit of frequency.
Thus if wheat has been sold in market 4 at an average price of
29s. 1d. per quarter, in market B at an average price of 27s. 7d.,
and in market C at an average price of 28s. 4d., we may, if no
statement is made as to the quantities sold at these prices (as very

Sa
ay
        <pb n="247" />
        XL.—CORRELATION : MISCELLANEOUS THEOREMS. 221
often happens in the case of statements as to market prices), take
the arithmetic mean (28s. 4d.) as the general average. But if we
know that 23,930 qrs. were sold at 4, only 26 qrs. at B, and 3933
qrs. at C, it will be better to take the weighted mean

(29s. 1d. x 23,930) + (27s. 7d. x 26) + (28s. 4d. x 3933) 99

~ 27889 oT
to the nearest penny. This is appreciably higher than the
arithmetic mean price, which is lowered by the undue importance
attached to the small markets B and C.

In the case of index-numbers for exhibiting the changes in
average prices from year to year (¢f. Chap. VIL. § 25), it may
make a sensible difference whether we take the simple arithmetic
mean of the index-numbers for different commodities in any one
year as representing the price-level in that year, or weight the
index-numbers for the several commodities according to their
importance from some point of view ; and much has been written
as to the weights to be chosen. If, for example, our standpoint
be that of some average consumer, we may take as the weight for
each commodity the sum which he spends on that commodity in
an average year, so that the frequency of each commodity is
taken as the number of shillings or pounds spent thereon instead
of simply as unity.

Rates or ratios like the birth-, death-, or marriage-rates of a
country may be regarded as weighted means. For, treating the
rate for simplicity as a fraction, and not as a rate per 1000 of the
population,

Birth-rate of whole country = el

_ =(birth-rate in each district x population in that district)

- S (population of each district)

i.e. the rate for the whole country is the mean of the rates in the
different districts, weighting each in proportion to its population.
We use the weighted and unweighted means of such rates as
illustrations in §17 below.

16. It is evident that any weighted mean will in general differ
from the unweighted mean of the same quantities, and it is
required to find an expression for this difference. If » be the
correlation between weights and variables, o,, and o, the standard-
deviations, and @ the mean weight, we have at once

3(W.X)=NMw+ro,o,),
whe... M=M+ ro,

“noe 15)
        <pb n="248" />
        - THEORY OF STATISTICS.

That is to say, if the weights and variables are positively correlated,
the weighted mean is the greater ; if negatively, theless. In some
cases r is very small, and then weighting makes little difference,
but in others the difference is large and important, » having a
sensible value and o,0,/@ a large value.

17. The difference between weighted and unweighted means
of death-rates, birth-rates or other rates on the population in
different districts is, for instance, nearly always of importance.
Thus we have the following figures for rates of pauperism
(Jour. Stat. Soc., vol. lix. (1896), p. 349).

Percentages of the Population in
receipt of Relief.
January 1.
Arithmetic Mean England and
of Rates in Wales as a
different Districts. whole.
1850 6°51 5-80
1860 5-20 4°26
1870 545 4°77
1881 368 3°12
1891 3:29 2°69

In this case the weighted mean is markedly the less, and the
correlation between the population of a district and its pauperism
must therefore be negative, the larger (on the whole urban) dis-
tricts having the lower percentage in receipt of relief. On the
other hand, for the decade 1881-90 the average birth-rate for
England and Wales was 32:34 per thousand, the arithmetic
mean of the rates for the different districts 30-34 only. The
weighted mean was therefore the greater, the birth-rate being
higher in the more populous (urban) districts, in which there is
a greater proportion of young married persons.

For the year 1891 the average population of a Poor-law district
was found to be roughly 45,900 and the standard-deviation o,
56,400 (populations ranging from under 2000 to over half a
million). The standard-deviation o, of the percentages of the
population in receipt of relief was 1:24. We have therefore,
for the correlation between pauperism and population,

3:29 — 2:69 459
r= ———— X
1-24 564

amr 0

222
J.3G.
        <pb n="249" />
        XI.—CORRELATION : MISCELLANEOUS THEOREMS. 223

For the birth-rate, on the other hand, assuming that o,/@
is approximately the same for the decade 1881-90 as in 1891,
we have, o, being 4:08,

32-34 — 30:34 459
ol 50d
= 4 40,

The closeness of the numerical values of # in the two cases is,
of course, accidental.

18. The principle of weighting finds one very important
application in the treatment of such rates as death-rates, which
are largely affected by the age and sex-composition of the popula-
tion. Neglecting, for simplicity, the question of sex, suppose the
numbers of deaths are noted in a certain district for, say, the
age-groups 0-, 10-, 20, ete., in which the fractions of the whole
population are pg, p,, p,, etc, where 2(p)=1. Let the death-
rates for the corresponding age-groups be dy, d,, dy, ete. Then
the ordinary or crude death-rate for the district is

D=3(d.p) i (16)

For some other district taken as a basis of comparison, perhaps
the country as a whole, the death-rates and fractions of the
population in the several age-groups may be 6, 8,8; . . . , m 7,
ms « « + , and the crude death-rate

A=3(5.7) x (17)

Now D and A may differ either because the @’s and &amp;'s differ
or because the p’s and =’s differ, or both. It may happen that
really both districts are about equally healthy, and the death-
rates approximately the same for all age-classes, but, owing to a
difference of weighting, the first average may be markedly higher
than the second, or wvice vers. If the first district be a rural
district and the second urban, for instance, there will be a larger
proportion of the old in the former, and it may possibly have a
higher crude death-rate that the second, in spite of lower death-
rates in every class. The comparison of crude death-rates is
therefore liable to lead to erroneous conclusions. The difficulty
may be got over by averaging the age-class death-rates in the

district not with the weights Py Po pp - - . . given by its own
population, but with the weights, mT, Ty Ty . . . . given by the
population of, the standard district. The standardised death rate
for the district will then be

D' =3(d.w)

18)
        <pb n="250" />
        . THEORY OF STATISTICS.

and D' and A will be comparable as regards age-distribution.
There is obviously no difficulty in taking sex into account as well
as age if necessary. The death-rates must be noted for each sex
separately in every age-class and averaged with a system of
weights based on the standard population. The method is also
of importance for comparing death-rates in different classes of the
population, e.g. those engaged in given occupations, as well as in
different districts, and is used for both these purposes in the
Decennial Supplements to the Reports of the Registrar General
for England and Wales (ref. 16).

19. Difficulty may arise in practical cases from the fact that
the death-rates d, d, d;, . . . . are not known for the districts or
classes which it is desired to compare with the standard popula-
tion, but only the crude rates D and the fractional populations
of the age-classes p; py, pg . . . . The difficulty may be partially
obviated (cf. Chap. IV. § 9, pp. 51-3), by forming what is
termed an sndex death-rate A’ for the class or district, A” being
given by

A=306p)¥ 4 aE
t.e. the rates of the standard population averaged with the
weights of the district population. It is the crude death-rate
that there would be in the district if the rate in every age-
class were the same as in the standard population. An
approximate standardised death-rate for the district or class is
then given by
A
Di=D rs (20)
D” is not necessarily, nor generally, the same as D". It can
only be the same if
S(d.w) 3(o.w)
(dp) 2(0p)
This will hold good if, e.g., the death-rates in the standard
population and the district stand to one another in the same
ratio in all age-classes, s.e. 8,/d; = 8,/d,=0,/d; = etc. This method
of standardisation is used in the Annual Summaries of the
Registrar-General for England and Wales.

Both methods of standardisation —that of § 18 and that of the
present section—are of great and growing importance. They are
obviously applicable to other rates besides death-rates, e.g. birth-
rates (cf. refs. 17, 18). Further, they may readily be extended
into quite different fields. Thus it has been suggested (ref. 19)
that standardised average heights or standardised average weights

2924
        <pb n="251" />
        XI.—CORRELATION : MISCELLANEOUS THEOREMS. 225
of the children in different schools might be obtained on the
basis of a standard school population of given age and sex
composition, or indeed of given composition as regards hair and
eye-colour as well.

20. In §§ 14-17 we have dealt only with the theory of
the weighted arithmetic mean, but it should be noted that
any form of average can be weighted. Thus a weighted median
can be formed by finding the value of the variable such that
the sum of the weights of lesser values is equal to the sum
of the weights of greater values. A weighted mode could be
formed by finding the value of the variable for which the sum
of the weights was greatest, allowing for the smoothing of
casual fluctuations. Similarly, a weighted geometric mean could
be calculated by weighting the logarithms of every value of the
variable before taking the arithmetic mean, i.e.

7
log GC, = (WV. log X)
tA)
REFERENCES.
Effect of Grouping Observations.

(1) SHEPPARD, W. F., ““On the Calculation of the Average Square, Cube, etc.,

of a large number of Magnitudes,” Jour. Roy. Stat. Soc, vol. lx., 1897,
. 698.

(2) Suen, W. F., “On the Calculation of the most probable Values of
Frequency Constants for Data arranged according to Equidistant
Divisions of a Scale,” Proc, Lond. Math. Soc., vol. xxix. p. 353. (The
result given in eqn. (4) for the correction of the standard-deviation is
Sheppard’s result.) i:

(8) SueppARD, W. F., “The Calculation of Moments of a Frequency-distribu-
tion,” Biometrika, v., 1907, p. 450.

(4) PEARSON, KarL, and others [editorial], “On an Elementary Proof of
Sheppard’s Formule for correcting Raw Moments, and on other allied
points,” Biometrika, vol. iii., 1904, p. 308.

(5) PEARSON, KARL, ‘ On the Influence of ¢ Broad Categories’ on Correlation,”
Biometrika, vol. ix., 1913, pp. 116-139.

Effect of Errors of Observation on the Correlation-coefficient,

(6) SprARMAN, C., “The Proof and Measurement of Association between Two
Things,” Amer. Jour. of Psychology, vol. xv., 1904, p., 88.
(Formula (8).)

(7) SrearMAN, C., “‘ Demonstration of Formule for True Measurement of
Correlation,” Amer. Jour. of Psychology, vol. Xviii., 1907, p. 161.
(Proof of formula (8), but on different lines to that given in the text,
which was communicated to Spearman in 1908, and published by
Brown and by Spearman in (8) and (10).)

(8) SrEARMAN, C., “Correlation calculated from Faulty Data,” British Jour.
of Psychology, vol. iii., 1910, p. 271.
15
        <pb n="252" />
        THEORY OF STATISTICS.

(9) JAcoB, S. M., “On the Correlations of Areas of Matured Crops and the
Rainfall,” Mem. Asiatic Soc. Bengal, vol. ii., 1910, P- 847. 0(3Y7
contains remarks on the effects of errors on the correlations and
regressions, with especial reference to this problem.)

(10) Brown, W., “Some Experimental Results in Correlation,” Proceedings
of the Sixth International Congress of Psychology, Geneva, August 1509.

Correlations between Indices, ete.

(11) PrArsoN, KARL, ‘On a Form of Spurious Correlation which may arise
when Indices are used in the Measurement of Organs,” Proc. Loy. Soc.,
vol. Ix., 1897, p. 489. (§§8, 9.)

(12) GALTON, Francis, ‘Note to the Memoir by Prof. Karl Pearson on
Spurious Correlation,” 4bid., p. 498.

(18) YuLg, G. U., “On the Interpretation of Correlations between Indices or
Ratios,” Jour. Roy. Stat. Soc., vol. Ixxiii., 1910, p. 644.

(14) Brown, J. W., M. GREENWoOD, and Frances Woop, “A Study of
Index-Correlations,” Jour. Roy. Stat. Soc., vol. 1xxvii., 1914, pp.317-46.

The Weighted Mean.

(15) PEARSON, KARL, ‘‘Note on Reproductive Selection,” Proc. Roy. Soc.,
vol. lix., 1896, p. 301. (Eqn. (15).)

Standardisation or Correction of Death-rates, etc.

(16) TaTrAM, JoEN, Supplement to the Fifty-fifth Annual Report of the
Regisirar-General for England and Wales: Introductory Letters to
Pt. I. and Pt. 11, Also Supplement to Sixty-fifth Report: Introductory
Letter to Pt. II. (Cd. 7769, 1895 ; 8503, 1897 ; 2619, 1908).

(17) NewsHOLME, A., and T. H. O. STEVENSON, ‘‘ The Decline of Human
Fertility in the United Kingdom and other Countries, as shown by
Corrected Birth-rates,” Jour. Roy. Stat. Soc., vol. lxix., 1906, p. 34.

(18) YuLe, G. U., “On the Changes in the Marriage and Birth Rates in
England and Wales during the past Half Century,” etc., &lt;bid., p. 88.

(19) Heron, Davi, “The Influence of Defective Physique and Unfavourable
Home Environment on the Intelligence of School Children,” Eugenics
Laboratory Memorrs, viii. ; Dulau &amp; Co., London, 1910.

Miscellaneous.

(20) PeArsoN, KARL, ALICE LEE, and L. BRAMLEY-MOORE, ‘Genetic
(reproductive) Selection: Inheritance of Fertility in Man and of
Fecundity in Thoroughbred Racehorses,” Phil. Trans. Roy. Soc.,
Series A, vol. cxcii., 1899, p. 257.

(A number of theorems of general application are given in the intro-
ductory part of this memoir, some of which have been utilised in §§ 12
13 of the preceding chapter.)

EXERCISES.

1. Find the values obtained for the standard-deviations in Examples ii.
(p. 139) and iii. (p. 141) of Chapter VIII. on applying Sheppard’s correction
for grouping. :

2. Show that if a range of six times the standard-deviation covers at least 18
class-intervals (¢f. Chap. VI. § 5), Sheppard’s correction will make a ditference
of less than 0°5 per cent. in the rough value of the standard-deviation.

3. (Data from the decennial supplements to the Annual Reports of the
Registrar-General for England and Wales.) The following particulars are

2926
        <pb n="253" />
        XI.—CORRELATION : MISCELLANEOUS THEOREMS, 227

found for 36 small registration districts in which the number of births in a

decade ranged between 1500 and 2500 :—

Proportion of Male Births
per 1000 of all Births.
Decade,
Mean Standard-
: deviation.
1881-1890 . : 508-1 12°80
1891-1900 ; 5084 10-37
Both decades 50825 1165

It is believed, however, that a great part of the observed standard-deviation
is due to mere ‘‘ fluctuations of sampling ” of no real significance.

Given that the correlation between the proportions of male births in a
district in the two decades is+ 036, estimate (1) the true standard-deviation
freed from such fluctuations of sampling ; (2) the standard-deviation of fluctua-
tions of sampling, 4.e, of the errors produced by such fluctuations in the observed
proportions of male births,

4. (Data from Pearson, ref. 11.) The coefficients of variation for breadth,
height, and length of certain skulls are 3°89, 3:50, and 324 per cent. respec-
tively. Find the ‘spurious correlation ” between the breadth/length and
height/length indices, absolute measures being combined at random so that
they are uncorrelated.

5. (Data from Boas, communicated to Pearson : ¢f. Fawcett and Pearson,
Proc. Roy. Soc., vol. Ixii. p. 413.) From short series of measurements on
American Indians the mean coefficient of correlation found between father and
son, and father and daughter, for cephalic index, is 0°14 ; between mother and
son, and mother and daughter 0-33. Assuming these coefficients should be
the same if it were not for the looseness of family relations, find the proportion
of children not due to the reputed father.

6. Find the correlation between X, + X; and X,+ X;; X,, X, and X, being
uncorrelated.

7. Find the correlation between X, and aX, +bX, X, and X, being
uncorrelated.

8. (Referring to illustration iv., § 14, Chap. X.) Use the answer to
question 7 to estimate, very roughly, the correlation that would be found
between annual movements in infantile and general mortality if the mortality
of those under and over 1 year of age were uncorrelated. Note that—

pesos) brs } =infantile mortality per 1000 births x a

+ deaths over one year per 1000 of population.
and treat the ratio of births to population as if it were constant at a rough
average value, say 0°033. The standard-deviation of annual movements in
infantile mortality is (loc. cit.) 9'6, and that of annual movements in mortality
other than infantile may be taken as sensibly the same as that of general
mortality, or say 1 unit.

9. If the relation

az) +b.ag+ecxg=0
        <pb n="254" />
        - THEORY OF STATISTICS

holds for all values of @;,, ®, and #; (which are, in our usual notation,
deviations from their respective arithmetic means), find the correlations
between @,, 2, and «3; in terms of their standard-deviations and the values of
a, b and c.

10. What is the effect on a weighted mean of errors in the weights or the
quantities weighted, such errors being uncorrelated with each other, with the
weights, or with the variables—(1) if the arithmetic mean values of the errors
are zero ; (2) if the arithmetic mean values of the errors are not zero ?

11. Cf. (Pearson, ‘‘On a Generalised Theory of Alternative Inheritance,”
Phil. Trams., vol. cciii., A; 1904, p. 53). If we consider the correlation
between number of recessive couplets in parent and in offspring, in a
Mendelian population breeding at random (such as would ultimately result
from an initial cross between a pure dominant and a pure recessive), the
correlation is found to be 1/3 for a total number of couplets n. If n=1, the
only possible numbers of recessive couplets are 0 and 1, and the correlation
table between parent and offspring reduces to the form

Parent.
Offspring. .
Total
3
Total
Verify the correlation, and work out the association coefficient Q.

12. (Cf. the above, and also Snow, Proc. Roy. Soc., vol. Ixxxiii., B, 1910,
Table III, p. 42.) For a similar population the correlation between
brothers, assuming a practically infinite size of family, is 5/12. The table is

First Brother.
Second
Brother. | Total.
0 18
2 Tl °
Total NRO! :
Verify the correlation, and work out the association coefficient Q. :

18. Referring to the notation of § 10, show that we have the following

expressions for the regressions in a fourfold table :—
you Nb (4B) (48)
op (B)B) (B) (B)
yO V5 (4B) — (eB)
gp (4)a) (4) (a)
Verify on the tables of questions 11 and 12.

228
        <pb n="255" />
        CHAPTER XII.
PARTIAL CORRELATION,

1-2, Introductory explanation—3. Direct deduction of the formule for two
variables—4. Special notation for the general case : generalised re-
gressions—5. Generalised correlations—6. Generalised deviations and
standard-deviations—7-8. Theorems concerning the generalised pro-
duct-sums—9. Direct interpretation of the generalised regressions—
10-11. Reduction of the generalised standard-deviation—12. Reduc-
tion of the generalised regression—13. Reduction of the generalised
correlation-coefficient—14. Arithmetical work : Example i. : Example
ii.—15. Geometrical representation of correlation between three
variables by means of a model—16. The coefficient of n-fold correlation
—17. Expression of regressions and correlations of lower in terms of
those of higher order—18. Limiting inequalities between the values of
correlation-coefficients necessary for consistence—19. Fallacies.

1. In Chapters IX.-XI. the theory of the correlation-coefficient for

a single pair of variables has been developed and its applications

illustrated. But in the case of statistics of attributes we found

It necessary to proceed from the theory of simple association for

a single pair of attributes to the theory of association for several

attributes, in order to be able to deal with the complex causation

characteristic of statistics; and similarly the student will find it
impossible to advance very far in the discussion of many problems
in correlation without some knowledge of the theory of multiple

correlation, or correlation between several variables. In such a

problem as that of illustration i., Chap. X., for instance, it might

be found that changes in pauperism were highly correlated

(positively) with changes in the out-relief ratio, and also with

changes in the proportion of old ; and the question might arise how

far the first correlation was due merely to a tendency to give out-
relief more freely to the old than the young, 7.e. to a correlation
between changes in out-relief and changes in proportion of old.

The question could not at the present stage be answered by work-

ing out the correlation-coefficient between the last pair of variables,

for we have as yet no guide as to how far a correlation between

290
        <pb n="256" />
        4 THEORY OF STATISTICS.

the variables 1 and 2 can be accounted for by correlations
between 1 and 3 and 2 and 3. Again, in the case of illustration iii.,
Chap. X., a marked positive correlation might be observed between,
say, the bulk of a crop and the rainfall during a certain period, and
practically no correlation between the crop and the accumulated
temperature during the same period ; and the question might arise
whether the last result might not be due merely to a negative
correlation between rain and accumulated temperature, the crop
being favourably affected by an increase of accumulated temper-
ature of other things were equal, but failing as a rule to obtain this
benefit owing to the concomitant deficiency of rain. In the prob-
lem of inheritance in a population, the corresponding problem is
of great importance, as already indicated in Chapter IV. It is
essential for the discussion of possible hypotheses to know whether
an observed correlation between, say, grandson and grandparent
can or cannot be accounted for solely by observed correlations
between grandson and parent, parent and grandparent.

2. Problems of this type, in which it is necessary to consider
simultaneously the relations between at least three variables, and
possibly more, may be treated by a simple and natural extension
of the method used in the case of two variables. The latter case
was discussed by forming linear equations between the two
variables, assigning such values to the constants as to make the
sum of the squares of the errors of estimate as low as possible :
the more complicated case may be discussed by forming linear
equations between any one of the = variables involved, taking
each in turn, and the » — 1 others, again assigning such values to
the constants as to make the sum of the squares of the errors of
estimate a minimum. If the variables are X; X, X, . ... X,,
the equation will be of the form

Xi=a+b6,X, +6, X;+ .... +5,.X,.
If in such a generalised regression or characteristic equation we
find a sensible positive value for any one coefficient such as b,,
we know that there must be a positive correlation between X,
and X, that cannot be accounted for by mere correlations of Xj
and X, with X,, X,, or X,, for the effects of changes in these
variables are allowed for in the remaining terms on the right.
The magnitude of 6, gives, in fact, the mean change in X,
associated with a unit change in X, when all the remaining
variables are kept constant. The correlation between X, and
X, indicated by 6, may be termed a partial correlation, as
corresponding with the partial association of Chapter IV., and it
is required to deduce from the values of the coefficients 4, which
may be termed partial regressions, partial coefficients of corre-

230
        <pb n="257" />
        XIL—PARTIAL CORRELATION, orl
lation giving the correlation between X; and X, or other pair of
variables when the remaining variables X, . . . . X, are kept
constant, or when changes in these variables are corrected or allowed
for, so far as this may be done with a linear equation. For examples
of such generalised regression-equations the student may turn to
the illustrations worked out below (pp. 239-247).

3. With this explanatory introduction, we may now proceed to
the algebraic theory of such generalised regression-equations and
of multiple correlation in general. It will first, however, be as
well to revert briefly to the case of two variables. In Chapter IX,
to obtain the greatest possible simplicity of treatment, the value
of the coefficient »=p/o 0, was deduced on the special assump-
tion that the means of all arrays were strictly collinear, and the
meaning of the coefficient in the more general case was sub-
sequently investigated. Such a process is not conveniently
applicable when a number of variables are to be taken into
account, and the problem has to be faced directly: i.e. required,
to determine the coefficients and constant term, if any, in a
regression-equation, so as to make the sum of the squares of the
errors of estimate a minimum. We will take this problem first
for the case of two variables, introducing a notation that can be
conveniently adapted to more. Let us take the arithmetic
means of the variables as origins of measurement, and let z;, ,
denote deviations of the two variables from their respective
means. Then it is required to determine a, and &amp;,, in the re-
gression-equation

z=, +b, kL 58)
so as to make Z(z, -a,+b,,.,)% for all associated pairs of
deviations x, and a, the least possible. Put more briefly, if
we write

N.S o=3(x, ~ a. + b;5.2,)% . . (9)
so that s,, is the root-mean-square value of the errors of estimate
in using regression-equation (a) (¢f. Chap. IX. § 14), it is required
to make s;, a minimum. Suppose any value whatever to be
assigned to b,,, and a series of values of a, to be tried, s,, being
calculated for each. Evidently s,, would be very large for
values of a; that erred greatly either in excess or defect of the
best value (for the given value of &amp;,,), and would continuously
decrease as this best value was approached ; the value of s, , could
never become negative, though possibly, but exceptionally, zero.
If therefore the values of s,, were plotted to the values of a; on
a diagram, a curve would be obtained more or less like that
of fiz. 44. The best value of a, for which s,, attained its

aq
        <pb n="258" />
        : THEORY OF STATISTICS.
minimum value, say a, 4, could be approximately estimated from
such a diagram ; but it can be calculated with much more exact-
ness from the condition that ¢f a’; a”; be two values close above
and below the best, the corresponding values of s,, are equal. Let
a, and (a, + 8) be two such values. Then if
(wy = ay + brg.2g)2 = 2(2; — ay + 8 + by)?
when 6 is very small, the value of a, is the best for the assigned
value of b,,. But, evidently, the equation gives, neglecting
the term in 82, .
3(@) = ay + byp.2) = 0,
that is,
a,=0
whatever the value of by, This is the direct proof of the
a, ; i .
Fic. 44.
result that no constant term need be introduced on the right
of a regression-equation when written in terms of deviations
from the arithmetic mean, or that the two lines of regression
must pass through the mean (Chap. IX. § 10). We may
therefore omit any constant term. If, now, b,, is to be assigned
the best value, we must have, by similar reasoning, for slightly
differing values, 4,,, b;, +8,
S(@) = byg.p)? = 3(w; — [bg + 8],)2
That is, again neglecting terms in 82
Sy(@) = bg.) = 0
or, breaking up the sum,
3 _ 2m) Lik]
12 (x2) == Ann a,

232
(c)
        <pb n="259" />
        XIL—PARTIAL CORRELATION.
which is the value found by the previous indirect method of
Chapter IX. From the fact that &amp;,, is determined so as to
make the value of 3(a, — b,,7,)? the least possible, the method
of determination is sometimes called the method of least squares.
Evidently all the remaining results of Chapter IX. follow from
this, and notably we have for 01.9 the minimum value of s,,,
the standard-deviation of errors of estimate
of =0(1 -mp?) : . (a)
4. Now apply the same method to the regression-equation
for m variables. Writing the equation in terms of deviations,
it follows from reasoning precisely similar to that given above
that no constant term need be entered on the right-hand
side. For the partial regression-coefficients (the coefficients of
the z’s on the right) a special notation will be used in order
that the exact position of each coefficient may be rendered quite
definite. The first subscript affixed to the letter 4 (which will
always be used to denote a regression) will be the subscript of
the z on the left (the dependent variable), and the second will
be the subscript of the « to which it is attached ; these may
be called the primary subscripts. After the primary subscripts,
and separated from them by a point, are placed the subscripts
of all the remaining variables on the right-hand side as secondary
subscripts. The regression-equation will therefore be written
in the form
Ty =bos,  n-Tg+byun,, n.23+ sre Oinos... m1" Zn i {)
The order in which the secondary subscripts are written is,
it should be noted, quite indifferent, but the order of the
primary subscripts is material; eg. b,, _ , and boys.
denote quite distinct coefficients, x, being the dependent variable
in the first case and ro in the second. A coefficient with »
secondary subscripts may be termed a regression of the pth order.
The regressions b,,, by1s byg bg, ete., in the case of two variables
may be regarded as of order zero, and may be termed total as
distinct from partial regressions.
5. In the case of two variables, the correlation-coefficient Tyg
may be regarded as defined by the equation
719 = (b19.07)%
We shall generalise this equation in the form
Tam... .0=0un... inban,.. in : +)
This is at present a pure definition of a new symbol, and it
remains to be shown that ry, may really be regarded as,

239
        <pb n="260" />
        out THEORY OF STATISTICS.

and possesses all the properties of, a correlation-coefficient ; the
name may, however, be applied to it, pending the proof. A
correlation-coefficient with » secondary subscripts will be termed
a correlation of order p. Evidently, in the case of a correlation-
coefficient, the order in which both primary and secondary
subscripts is written is indifferent, for the right-hand side of
equation (2) is unaltered by writing 2 for 1 and 1 for 2. The
correlations 7, 74, etc., may be regarded as of order zero, and
spoken of as total, as distinct from partial, correlations.

6. If the regressions Bus... m bugs...» 6b. be assigned the
“best ” values, as determined by the method of least squares, the
difference between the actual value of x; and the value assigned
by the right-hand side of the regression-equation (1), that is, the
error of estimate, will be denoted by x5; ,.. . .; ¢.c as a defini-
tion we have
L193...n=% 7 Dinse) np 7 bis2s...n%s adler Diniz loin)" . (3)
where x, x, . . . . x, are assigned any one set of observed values.
Such an error (or residual, as it is sometimes called) denoted by a
symbol with p secondary suffixes, will be termed a deviation of the
pth order. Finally, we will define a generalised standard-deviation
O13 . . . . » DY the equation

N.ol, Sl aE Ns ih ) , - . (4)
N being, as usual, the number of observations. A standard-
deviation denoted by a symbol with p secondary suffixes will be
termed a standard-deviation of the pth order, the standard-
deviations oy a, etc., being regarded as of order zero, the standard-
deviations ay, 04; €tc., (cf. eqn. (d) of § 3) of the first order, and
0 on.

7. From the reasoning of § 3 it follows that the “least-square”
values of the partial regressions b;,3 . . . . a etc, will be given by
equations of the form

Sy —bm....a Xt... . + bins... ns Ta)

= (mr, EE SE ETE IT
§ being very small. That is, neglecting the term in &amp;?

Sao(2, x bios ..n-Tgt a bins . . .. m1 Tn) =0,
or, more briefly, in terms of the notation of equation (3),

(xy. 2195. ...2n)=0. : (5)

There are a large number of these equations, (n — 1) for determin-
ing the coefficients by, . . .. a» etc., (n— 1) again for determining

FON
        <pb n="261" />
        XIL—PARTIAL CORRELATION. 205
the coefficients 6,4 . . . . , etc, and so on: they are sometimes
termed the normal equations. If the student will follow the pro-
cess by which (5) was obtained, he will see that when the con-
dition is expressed that 4,,,, _ _, shall possess the “least-square ”
value, z, enters into the product-sum with #4; ... .,; when the
same condition is expressed for d,5,, .. ., ; enters into the
product-sum, and so on. Taking each regression in turn, in fact,
every x the suffix of which is included in the secondary suffixes
of #05... , enters into the product-sum. The normal equations
of the form (5) are therefore equivalent to the theorem—

The product-sum of any deviation of order zero with any deviation
of higher order is zero, provided the subscript of the former occur
among the secondary subscripts of the latter.

8. But it follows from this that
(2134... 234...n) =ZZ1s4...n(@p—boss...n.Tg— 0. —Donss... n=1. Tn)

=3(21.34 ...n. Tg)

Similarly,

(21.34... n. T2354... n) = Z(x, «X34... 0)

Similarly again,

Z(@134...n- T2354... (0-1) = (L134... mn. Ty),

and so on. Therefore, quite generally,

2(Tray nea n=3(x , ... n=1)+®234 ....m)
= Tyg... nm)
= BE «0 ETSY 3 irs nerd)
=2%15... n-&amp;)

Comparing all the equal product-sums that may be obtained
in this way, we see that the product-sum of any two deviations is
unaltered by omitting any or all of the secondary subscripts of either
which are common to the two, and, conversely, the product-sum of any
deviation of order | with a deviation of order p+q, the p subscripts
being the same in each case, is unaltered by adding to the secondary
subscripts of the former any or all of the q additional subscripts of
the latter.

It follows therefore from (5) that any product-sum is zero if ali
the subscripts of the one deviation occur among the secondary sub-
scripts of the other. As the simplest case, we may note that a, is
uncorrelated with z,;, and z, uncorrelated with z,,.

The theorems of this and of the preceding paragraph are of
fundamental importance, and should be carefully remembered.

Rf
        <pb n="262" />
        THEORY OF STATISTICS.
9. We have now from §§ 7 and 8—
0=3(2yy . . .. ne ios... nm)
Fe 25,34 IIL {=} i Prost Le Zz, — terms in Zg to z,)
= 3(2; . Zyg i ») — ba5 tL CR) SE &gt;
= S23... . pep. wm S@s....n)
That is
S(x nh )
b 24 1.34 ....mn 2.3.
12.340 0. n TI ed 5 . (7)
But this is the value that would have been obtained by taking a
regression-equation of the form
I a co ar
and determining 4,5. , by the method of least-squares, &lt;.e.
b1234 . . .. nis the regression of z,, =, ona, a uiollows
at once from (2) that r,, ___ , is the correlation between
Tig. ...pnand Zag, and from (4) that we may write
a n
I . . (8)
Eom 2 oh
an equation identical with the familiar relation &amp;;,=7,,0,/0,,
with the secondary suffixes 34 . . . . n added throughout.

To illustrate the meaning of the equation by the simplest case,
if we had three variables only, z,, ,, and x,, the value of byy5 OF
7193 could be determined (1) by finding the correlations 7, and
755 and the corresponding regressions b;, and b,,; (2) working out
the residuals #, — 65.2, and w, — b,,.2;, for all associated deviations ;
(3) working out the correlation between the residuals associated
with the same values of #,, The method would not, however, be
a practical one, as the arithmetic would be extremely lengthy,
much more lengthy than the method given below for expressing
a correlation of order p in terms of correlations of order p — 1.

- 10. Any standard-deviation of order p may be expressed in terms
of a standard-deviation of order p — 1 and a correlation of order p — 1.
For,

3210s... n) = 2(210s. .. (n-1)+ £1.35... n)

= Si. ol n=0)® = b1nss. .. (n—1y%n — terms in x, to Za)
wa S(af as. 2) (n—-1)) = bins... (n-1) (Tras... (n=1)* Tn23. .. (n-1))
or, dividing through by the number of observations,
Oo RL Li is m-1{1 EF b1n2s waits te (R=T)\® On1.23 Sie o=1))
=olas ... tll ss we £q)

236
        <pb n="263" />
        XII.—PARTIAL CORRELATION.

This is again the relation of the familiar form—
Ola = ai(1 TT 3.)

with the secondary suffixes 23 . . . . (n—1) added throughout.
It is clear from (9) that 7,65... (u_1), like any correlation of order
zero, cannot be numerically greater than unity. It also follows
at once that if we have been estimating z, from Zor Ty xv in» puis
x, will not increase the accuracy of estimate unless 7,0, (n=1)
(not 7y,) differ from zero. This condition is somewhat interesting,
as it leads to rather unexpected results. For example, if 7, = + 0-8,
r3= +04, 793= +05, it will not be possible to estimate #, with
any greater accuracy from x, and x; than from z, alone, for the
value of 74, is zero (see below, § 13).

11. It should be noted that, in equation (9), any other subscript
can be eliminated in the same way as subscript » from the suffix of
Oss. ...n SO that a standard-deviation of order p can be expressed
in » ways in terms of standard-deviations of the next lower order.
This is useful as affording an independent check on arithmetic.
Further, 0,9; (ay can be expressed in the same way in terms
of i935... (ng, and so on, so that we must have

iss...n=0(1 -7H)(1 - 715)(1 = TTazs) + «© (1- Tinss...m-1) « (10)

This is an extremely convenient expression for arithmetical use ;
the arithmetic can again be subjected to an absolute check by
eliminating the subscripts in a different, say the inverse, order.
Apart from the algebraic proof, it is obvious that the values must
be identical ; for if we are estimating one variable from = others, it
is clearly indifferent in what order the latter are taken into account.

12. Any regression of order » may be expressed in terms of
regressions of order p — 1. For we have

2(21.84... m0 22.34... 0) = (1.34... (n—=1) 234... m)
=2a154... (n-1)(T2— b2n.34... (n-1) . Tn — terms in 24 to 2p)
=2(x1.34... (0-1) 22.34... (n—1)) = bon.3s... n-1)Z(X1.34... n=1)+ Tn34. .. m=1)).
Replacing b,, | m=) DY Ons (n—1) * 03s... 1/0 vo. (n=1)
we have
b12.31...n. 0534... n=b1a34... (n=1)+ 05.34 . .. (n—1) —O1n.34. .. n—1). bn284 .(n—1). C334. . . (n=1),
or, from (9),
b -b b
b = 1234 .... (n—-1) 1n34....(n-1)* Yn2.34 «ae (n=1) 1}
8 1 =Buss.... 000 Durst so toed) £1
The student should note that this is an expression of the form
b br J b1n - Ons
12.n 1 T ban L bs

237
        <pb n="264" />
        2 THEORY OF STATISTICS.

with the subscripts 34 . . . . (n—1) added throughout. The
coefficient by,  , may therefore be regarded as determined
from a regression-equation of the form

Trap le in=1)= IRE oh Diniz. m—=1) + Tn34... (n—1)
Z.e. it is the ‘partial regression of #,,, .. OD gy .... mp
Tn3 ....m-y Deing given. As any other secondary suffix might
have been eliminated in lieu of », we might also regard it as
the partial regression of 145, .. OD %ay5. .. ny %sa5....n DEINE
given, and so on.

13. From equation (11) we may readily obtain a corresponding
equation for correlations. For (11) may be written

Brits i Trot... 0-1)” Toe ceo tn=) Tse... (0-1) TL... (n-1).

Sli i 0234. ... (1)
Hence, writing down the corresponding expression for bys...»
and taking the square root

Test....n=1) —Tmnss.... 0-1) Ton3s....(n-1)
Ios ein ei te Rt sR L2
ia EEE DT az
This is, similarly, the expression for three variables
rama Tan Lyn Tl
2 EY -n)
with the secondary subscripts added throughout, and ry,
can be assigned interpretations corresponding to those of 6,53,
above. Evidently equation (12) permits of an absolute check or
the arithmetic in the calculation of all partial coefficients of an
order higher than the first, for any one of the secondary suffixes
of 7153... . » can be eliminated so as to obtain another equation of
the same form as (12), and the value obtained for 7,5  ..  , by
inserting the values of the coefficients of lower order in the
expression on the right must be the same in each case.

14. The equations now obtained provide all that is necessary
for the arithmetical solution of problems in multiple correlation.
The best mode of procedure on the whole, having calculated all
the correlations and standard-deviations of order zero, is (1) to
calculate the correlations of higher order by successive applications
of equation (12); (2) to calculate any required standard-deviations
by equation (10); (3) to calculate any required regressions by
equation (8): the use of equation (11) for calculating the
regressions of successive orders directly from each other is com-
paratively clumsy. We will give two illustrations, the first for

38
        <pb n="265" />
        XIL—PARTIAL CORRELATION. _
three and the second for four variables. The introduction of
more variables does not involve any difference in the form of the
arithmetic, but rapidly increases the amount.

Example i.—The first illustration we shall take will be a
continuation of example i. of Chapter IX. in which the correla-
tion was worked out between (1) the average earnings of agri-
cultural labourers and (2) the percentage of the population in
receipt of Poor-law relief in a group of 38 rural districts. In
Question 2 of the same chapter are given (3) the ratios of the
numbers in receipt of outdoor relief to the numbers relieved in the
workhouse, in the same districts. Required to work out the partial
correlations, regressions, etc., for these three variables.

Using as our notation X, = average earnings, X, = percentage of
population in receiptof relief, X,; = out-relief ratio, the first constants
determined are—

M, = 15-9 shillings oy = 171 shillings ro=- 066
4',= 3°67 per cent. ¢,=1-29 per cent. r= -013
M, = 519 oy =3'09 ros = + 0°60

To obtain the partial correlations, equation (12) is used direct in

its simplest form—
on TT
== (T= (=r)

The work is best done systematically and the results collected
in tabular form, especially if logarithms are used, as many of the
logarithms occur repeatedly. First it will be noted that the
logarithms of (1-72)! occur in all the denominators ; these had,
accordingly, better be worked out at once and tabulated (col. 2 of
the table below). In col. 3 the product term of the numerator of

Z. Lo, 6. 7 8. o

—— Product |N Nl 1 ain rp

log LIFE Font Bima hl log I-72)

log. | Value.

rig=-066  T87580  -0:0780 -0'5820 I76492 1'89938 186554 rye5-073 I-83216
nz=-013  I'99620  -0'3960 +0-2660 142483 I77589 164599 riat0'4d T0567
23=+060 190309  +0°0858 +05142 T71113 187209 I'83904 rop.+0:69 I-85046
each partial coefficient is entered, 7.e. the product of the two other
coefficients on the remaining lines in col. 1 ; subtracting this from
the coefficient on the same line in col. 1 we have the numerator(col.
4) and can enter its logarithm. The logarithm of the denominator
(col. 6) is obtained at once by adding the two logarithms of (1 &amp; #2)t
on the remaining lines of the table, and subtracting the logarithms

239
        <pb n="266" />
        Z THEORY OF STATISTICS

of the denominators from those of the numerators we have the
logarithms of the correlations of the first-order. It is also as well
to calculate at once, for reference in the calculation of standard-
deviations of the second-order, the values of log 1-72 for the
first-order coefficients (col. 9).

Having obtained the correlations we can now proceed to the
regressions. If we wish to find all the regression-equations, we
shall have six regressions to calculate from equations of the form

b123="195" T1.3/023.
These will involve all the six standard-deviations of the first
order og Og Og Ogg etc. But the standard-deviations of
the first-order are not in themselves of much interest, and the
standard-deviations of the second-order are so, as being the
standard-errors or root-mean-square errors of estimate made in
using the regression-equations of the second-order. We may
save needless arithmetic, therefore, by replacing the standard-
deviations of the first-order by those of the second, omitting the
former entirely, and transforming the above equation for &amp;,4
to the form
b193= "193" T103/Taus.
This transformation is a useful one and should be noted by the
student. The values of each o may be calculated twice inde-
pendently by the formule of the form
Tr95=0y(L = 7h)! (1 —17s,)}
== oy(1 od 72:4 (1 i 72.5)
so as to check the arithmetic; the work is rapidly done if the
values of log /1 —72 have been tabulated: The values found are
log 0.53 =0'06146 O93=1'15
log 0y,5=184584 0p15="070
log 05,,=0'34571 Ogq9= 222
From these and the logarithms of the »’s we have
log 6,55 =008116,d,,,= — 1:21 : log bia =1'36174, b,,,= +023
log by, = 164993, by 5= — 045 : log by; =1'33917, bpp; = +022
log by, ,= 193024, by; ,= +085 : log bgp; = 033891, gy; = + 2°18
That is, the regression-equations are
(1) z,=-1212,+023 z,
(2) wg=—045 2, +022 z,
(3) 2g= +085 x, +218 «,

240
        <pb n="267" />
        XII.—PARTIAL CORRELATION,
or, transferring the origins to zero,
(1) Earnings X;=+190-1-21 X,+0-23 X,
(2) Pauperism X,=+955-045 X, +022 X,
(3) Out-relief ratio X,= —15-7+ 085 X, +218 X,
The units are throughout one shilling for the earnings Xyil
per cent. for the pauperism X,, and 1 for the out-relief ratio Xj.

The first and second regression-equations are those of most
practical importance. The argument has been advanced that
the giving of out-relief tends to lower earnings, and the total
coefficient (r;= —0'13) between earnings (X;) and out-relief
(X,), though very small (¢f. Chap. IX. § 17), does not seem
inconsistent with such a hypothesis. The partial correlation
coefficient (r3,= +044) and the regression-equation (1), how-
ever, indicate that in unions with a given percentage of the
population in receipt of relief (X,) the earnings are highest where
the proportion of out-relief is highest; and this is, in so far,
against the hypothesis of a tendency to lower wages. It remains
possible, of course, that out-relief may adversely affect the possibil-
ity of earning, e.g. by limiting the employment of the old. As
regards pauperism, the argument might be advanced that the
observed correlation (ry; = +060) between pauperism and out-
relief was in part due to the negative correlation (r;3=-1013)
between earnings and out-relief. Such a hypothesis would have
little to support it in view of the smallness and doubtful signifi-
cance of ry; and is definitely contradicted by the positive partial
correlation 7,,, = + 0°69, and the second regression-equation. The
third regression-equation shows that the proportion of out-relief is
on the whole highest where earnings are highest and pauperism
greatest. It should be noticed, however, that a negative ratio is
clearly impossible, and consequently the relation cannot be strictly
linear; but the third equation gives possible (positive) average
ratios for all the combinations of pauperism and earnings that
actually occur.

Example ii.—(Four variables.) As an illustration of the form
of the work in the case of four variables, we will take a portion
of the data from another investigation into the causation of
pauperism, viz. that described in the first illustration of Chapter X.,
to which the student should refer for details. The variables are
the ratios of the values in 1891 to the values in 1881 (taken as
100) of—

1. The percentage of the population in receipt of relief,

2. The ratio of the numbers given outdoor relief to the numbers
relieved in the workhouse,

3. The percentage of the population over 65 years of age,

qr

241
10
        <pb n="268" />
        THEORY OF STATISTICS.

4. The population itself,
in the metropolitan group of 32 unions, and the fundamental
constants (means, standard-deviations and correlations) are as
follows :—

TABLE I
! 2. 3.

- Sta 4 ion- mii

Rs So oon log VIR

104-7 29-2 12 +052 1-93154

906 417 13 +041 1:96003

107-7 5:5 | 14 -0°14 199570

i= ‘8 23 +0°49 164038

b24 +023 1:98820

34 +025 198598
It is seen that the average changes are not great; the per-
centages of the population in receipt of relief have increased on
an average by 4'7 per cent., the out-relief ratio has dropped by
9-4 per cent., and the percentage of old has increased by 7-7
per cent., at the same time as the population of the unions has
risen on the average by 11'3 per cent. At the same time the
standard-deviations of the first, second, and fourth variables are
very large. As a matter of fact, while in one union the
pauperism decreased by nearly 50 per cent. and in others by
20. per cent. in some there were increases of 60, 80, and 90
per cent. ; similarly, in the case of the out-relief, in several unions
the ratio was decreased by 40 to 60 per cent., a consistent
anti-out-relief policy having been enforced ; in others the ratio
was doubled, and more than doubled. As regards population,
the more central districts show decreases ranging up to 20 and
25 per cent., the circumferential districts increases of 45 to 80
per cent. The correlations of order zero are not large, the
changes in the rate of pauperism exhibiting the highest correlation
with changes in the out-relief ratio, slightly less with changes
in the proportion of old, and very little with changes in

population.

The correlations of the second order are obtained in two steps.
In the first place, the six coefficients of order zero are grouped in
four sets of three, corresponding to the four sets of three variables
formed by omitting each one of the four variables in turn (Table
IT. col. 1). Each of these sets of three coefficients is then
treated in the same manner as in the last example, and so the

242
i. 2
AEC
        <pb n="269" />
        XII.—PARTIAL CORRELATION.
TasLE II.
1, 2 .
Correlation- Product Correlation-
coefficient Term of Numerator. coefficient log A/1 - 72
(Zero Order). Numerator. (First Order).
12 . +052 + 02009 + 03191 123 1 +0°4013 196187
13 |, +041 + 02548 +0°1552 | 13:2 +0°2084 199035
23 + +049 + 02132 +0°2768 231 +0+3553 197070
12 | + 0°52 - 00322 + 0-5522 124 | +0°5731 191355
14 -0°14 | +0°1196 -0°2596 142 | -0-3123 197772
24 +023 | -00728 ' +0-3028 [24:1 | +0'3580  1-97022
13 | +041 | ~00350 404450 | 134 | 404642  1-94731
11 -0'14 +0°1025 —-0°2425 | 143 - 02746 198297
31 +025 - 00574 +0°3074 34-1 + 03404 197326
2° +0°49 +0°0575 +0°4325 234 +0°4590 194863
: "+023 +0-1225 +0°1075 243 +0'1274 1-99645
: { +025 +0°1127 +0°1373 342 +0°1618 199424
TaBLe IIL
in po 1.
2.
Correlation- Product . Correlation-
coefficient Term of Numerator. coefficient log A/1-172
(First Order). Numerator. (Second Order).
124 | +0°5731  +0°2131 ~~ +0°3600 12-34 | +0457  1-94901
134 + 04642 + 02631 + 02011 1324 +0°276 1-98277
234  +0°4590 +0°2660 +0-1930 2314 +0°266 1-98408
12'3  +0°'4013 -00350 + 04363 12°34 +0457 _
143 - 02746 +00511 - 03257 14-23: —0-359 : 197013
243 +0°1274 -0'1102 «+ +0-2376 24131 +0270" 1-98359
132 +0°2084 -00505 +0°2589 13-24! +0276 —_—
14:2 | -0-3123 +00337 — 03460 1423 -0°359 —
342 +0°1618 - 00651 + 02269 34°12 +0°244 54
23'1 + +0°3553 +0°1219 + 02334 23°14 +0-2000
24-1 | + 0°3580 +0-1209 +0°2371 24°13 +0°270
34-1 + 03404 +0°1272 +0°2132 34°12 +0°244

243
g 3. 4 5.
Z 2 5
        <pb n="270" />
        244 THEORY OF STATISTICS.

correlations of the first order (Table IL. col. 4) are obtained.
The first-order coefficients are then regrouped in sets of three,
with the same secondary suffix (Table IIL. col. 1), and these
are treated precisely in the same way as the coefficients of order
zero. In this way, it will be seen, the value of each coefficient
of the second order is arrived at in two ways independently, and
so the arithmetic is checked: 7, ,, occurs in the first and fourth
lines, for instance, 7,,, in the second and seventh, and so on.
Of course slight differences may occur in the last digit if a
sufficient number of digits is not retained, and for this reason the
intermediate work should be carried to a greater degree of
accuracy than is necessary in the final result; thus four places
of decimals were retained throughout in the intermediate work of
this example, and three in the final result. If he carries out an
independent calculation, the student may differ slightly from
the logarithms given in this and the following work, if more or
fewer figures are retained.

Having obtained the correlations, the regressions can be calcu-
lated from the third-order standard-deviations by equations of the
form (as in the last example),

a
b1g.34="T1234 —y
2.134
80 the standard-deviations of lower orders need not be evaluated.
Using equations of the form
ores = (1 — r})}(1 - 7159) (1 — 78425)!
=oy(1 = ri)1 — ri)(1 - Th)!

we find

log 0.45, =135740 Gon =228

log a, .:,=1:50597 i]

log 03415, =0'65773 Os120=105

log 0,,3=132914 Oo pon=21'3
. All the twelve regressions of the second order can be readily
calculated, given these standard deviations and the correlations,
but we may confine ourselves to the equation giving the changes
in pauperism (X,) in terms of other variables as the most impor-
tant. It will be found to be

x, =0325x, + 1:383x, — 0-383,
or, transferring the origins and expressing the equation in terms of
percentage-ratios,
X,=-31'14+0'325X, + 1-383.X, - 0383X,,
        <pb n="271" />
        XII.—PARTIAL CORRELATION.
ar, again, in terms of percentage-changes (ratio — 100): —
Percentage change in pauperism
= + 1°4 per cent.
+0325 times the change in out-relief ratio.
+1-383 3s ,, proportion of old.
- 0-383 . ,» population.

These results render the interpretation of the total coefficients,
which might be equally consistent with several hypotheses, more
clear and definite. The questions would arise, for instance,
whether the correlation of changes in pauperism with changes in
out-relief might not be due to correlation of the latter with the
other factors introduced, and whether the negative correlation with
changes in population might not be due solely to the correlation
of the latter with changes in the proportion of old. As a matter
of fact, the partial correlations of changes in pauperism with
changes in out-relief and in proportion of old are slightly less than
the total correlations, but the partial correlation with changes in
population is numerically greater, the figures being

r= +052 T1034 = +046
13 = TL T1394 = +028
ry, = —-Uld Tigo = — 0°36

So far, then, as we have taken the factors of the case into
account, there appears to be a true correlation between changes
in pauperism and changes in out-relief, proportion of old, and
population—the latter serving, of course, as some index to
changes in general prosperity. The relative influences of the
three factors are indicated by the regression-equation above.
[For the full discussion of the case cf. Jour. Roy. Stat. Soc.
vol. Ixii., 1899.]

15. The correlation between pauperism and labourers’ earnings
exhibited by the figures of Example i. was illustrated by a diagram
(fig. 40, p. 180), in which scales of “pauperism” and “earnings ”
were taken along two axes at right angles, and every observed
pair of values was entered by marking the corresponding point
with a small circle: the diagram was completed by drawing in
the lines of regression. In precisely the same way the correlation
between three variables may be represented by a model showing the
distribution of points in space ; for any set of observed values Xp
X,, X,; may be regarded as determining a point in space, just as
any pair of values X, and X, may be regarded as determining a
point in a plane. Fig. 45 is drawn from such a model, constructed
from the data of Example i. Four pieces of wood are fixed together

245
        <pb n="272" />
        24° THEORY OF STATISTICS,
like the bottom and three sides of a box. Supposing the open
side to face the observer, a scale of pauperism is drawn vertically
upwards along the left-hand angle at the back of the “box,” the
rf “Il | | fl WT fra :
= Ea
- &gt;
D
Fie. 456.—Model illustrating the Correlation between three Variables: (1)
Pauperism (percentage of the population in receipt of Poor-law relief);
(2) Out-relief ratio (numbers given relief in their homes to one in the
workhouse) ; (3) Average Weekly Earnings of agricultural labourers,
(data pp. 178 and 189). A, front view ; B, view of model tilted till the
plane of regression for pauperism on the two remaining variables is scen
as a straight line.

16
        <pb n="273" />
        XIL—PARTIAL CORRELATION. 247
scale starting from zero, as very small values of pauperism occur :
a scale of out-relief ratio is taken along the angle between the
back and bottom of the box, starting from zero at the left: finally,
the scale of earnings is drawn out towards the observer along the
angle between the left-hand side and the bottom, but as earnings
lower than 12s. do not occur, the scale may start from 12s. at the
corner. Suitable scales are: pauperism, 1 in.=1 per cent. ; out-
relief ratio, 1 in.=1 unit; earnings, 1 in.=1s.; and the inside
measures of the model may then be 17 in. x 10 in. x 8 in. high,
the dimensions of the model constructed. Given these three
scales, any set of observed values determine a point within the
“box.” The earnings and out-relief ratio for some one union are
noted first, and the corresponding point marked on the baseboard ;
a steel wire is then inserted vertically in the base at this point
and cut off at the height corresponding, on the scale chosen, to
the pauperism in the same union, being finally capped with a
small ball or knob to mark the “point” clearly. The model
shows very well the general tendency of the pauperism to be the
higher the lower the wages and the higher the out-relief, for the
highest points lie towards the back and right-hand side of the
model. If some representation of all three equations of regression
were to be inserted in the model, the result would be rather
confusing ; so the most important equation, viz. the second, giving
the average rate of pauperism in terms of the other variables, may
be chosen. This equation represents a plane : the lines in which
it cuts the right- and left-hand sides of the “box” should be
marked, holes drilled at equal intervals on these lines on the
opposite sides of the box (the holes facing each other), and threads
stretched through these holes, thus outlining the plane as shown
in the figure. In the actual model the correlation-diagrams (like
fig. 40) corresponding to the three pairs of variables were drawn
on the back sides and base: they represent, of course, the eleva-
tions and plan of the points.

The student possessing some skill in handicraft would find it
worth while to make such a model for some case of interest to
himself, and to study on it thoroughly the nature of the plane of
regression, and the relations of the partial and total correlations.

16. If we write

Ol...” Bos...) . (13)
it may be shown that R,, .  , is the correlation between
x, and the expression on the right-hand side of the regression-
equation, say €,,5 . .. .,. Where
€103....=0rost...n Zot bi300.  n.23+ Ce Fines ity La . (14)
        <pb n="274" />
        243 THEORY OF STATISTICS.
For we have
(xy. e103... n) = 22) (2) — Ty 45 Sci n)=N(oF — 034 nin)
and also
(ef . . . n) = 22) — Fy 95 Joti n)?=N(a}- on... .n
whence the correlation between Zande... . ,is
04 !
v.e. the value of By, n given by (13). The value of RB is
accordingly a useful datum as indicating how closely x, can
be expressed in terms of a linear function of Fo Fini oi so « &amp;py ANA
the values of the regressions may be regarded as determined
by the condition that &amp; shall be a maximum. Its value is
essentially positive as the product-sum 2(xy.e95 . . . . 5) Is positive.
£ may be termed a coefficient of (n—1)-fold (or double, triple,
etc.) correlation; for n variables there are » such correlations,
but in the limiting case of two variables the two are identical.
The value may be readily calculated, either from Tio, .. 800
o, or directly from the equation
1 - Bios... w= (1 =18)1 - 7352) ~ 72s). . . (1 = Puss... (ney). (15)
It is obvious from this equation that since every bracket on
the right is not greater than unity,
LB... Pl =7%
Hence Ry; , . . . , cannot be numerically less than r,,. For the
same reason, rewriting (15) in every possible form, Ba
cannot be numerically less than ry, Tig + «+ « Tip 2.6. ANY ONE
of the possible constituent coefficients of order zero. Further,
for similar reasons, Ryn; » cannot be numerically less than
any possible constituent coeflicient of any higher order. That
is to say, Ry... , is not numerically less than the greatest
of all the possible constituent coefficients, and is usually, though
not always, markedly greater. Thus in Example i, Buy
(the coeflicient of double correlation between pauperism on
the one hand, out-relief and labourers’ earnings on the other)
is 0-839, and the numerically greatest of the possible constituent
coeflicients is 1,,= —0'73. Again, in Example iji., By, is
0:626, and the numerically greatest of the possible constituent
coefficients is ry, , = + 0-573.

The student should notice that R is necessarily positive.
Further, even if all the variables Xp Xp... . X, were strictly
uncorrelated in the original universe as a whole, we should expect
"120 "13.90 "14.93 €tc., to exhibit values (whether positive or negative)

By
        <pb n="275" />
        XIL.—PARTIAL CORRELATION. cy
differing from zero in a limited sample. Hence, RB will not
tend, on an average of such samples, to be zero, but will
fluctuate round some mean value. This mean value will
be the greater the smaller the number of observations in the
sample, and also the greater the number of variables. When
only a small number of observations are available it is,
accordingly, little use to deal with a large number of variables.
As a limiting case, it is evident that if we deal with » variables
and possess only = observations, all the partial correlations
of the highest possible order will be unity.

17. Tt is obvious that as equations (11) and (12) enable us to
express regressions and correlations of higher orders in terms of
those of lower orders, we must similarly be able to express the
coefficients of lower in terms of those of higher orders. Such
expressions are sometimes useful for theoretical work. Using the
same method of expansion as in previous cases, we have

0=2(z12.... 00205. Er
= (x Sr a (n=1)) ~bowm. on (a, Task... (n=)
— Le vee (n=1) (x, “oa an, (n-1))
That is,
b1a.54 “ren (n-1)= D134 sss am + bip.os eae (n=1)" Ono.34 ena (n=1)

In this equation the coefficient on the left and the last on the
right are of order n — 3, the other two of order n — 2. We therefore
wish to eliminate the last coefficient on the right. Interchanging
the suffixes 1 for # ana n for 1, we have

00.34 ‘inn == bors eve (B=1)"- + bp1.23 wwe wl) bio ven (B=1)p
Substituting this value for 8,44 . (n-1 10 the first equation we
have

b +b .b

b em 12.94... . n' ¥Yn2B....(n-1 n213.... acl). 16

RST Y=b00. wnt (n=1) ( )
This is the required equation for the regressions ; it is the equation

biome bran + bins - bua
12 1- bins . bora
with secondary suffixes 34 ....(n- 1) added throughout. The
corresponding equation for the correlations is obtained at once
by writing down equation A6) ford, m-1 and taking the
square root of the product (cf. § 13) ; this gives
Tiesto nt Tinos i nel) TonaE s ao fee)
Tis. ...0-0= Gh Bele
324 bey ~Fonns. os w-0' (1 = T5013. . .. n-1)* £3

24¢
        <pb n="276" />
        29 THEORY OF STATISTICS.
which is similarly the equation
ps Ton + Tins - Tony
2S =v ld = 72a)
with the secondary suffixes 34 .... (n— 1) added throughout.

18. Equations (12) and (17) imply that certain limiting
inequalities must hold between the correlation-coefficients in
the ‘expression on the right in each case in order that real
values (values between +1) may be obtained for the correlation-
coefficient on the left. These inequalities correspond precisely
with those ‘conditions of consistence” between class-frequencies
with which we dealt in Chapter II., but we propose to treat them
only briefly here. Writing (12) in its simplest form for 7,
we must have 7},,&lt;1 or

(P19 = 715+ Tos)”
Et Su]
(1 = ri) (L — 73)
that is,
Tio + 1s + 13 — 2rigrigryy &lt;1 (18)
if the three 7”s are consistent with each other. If we take ry, 7;
as known, this gives as limits for 7,,
Tihs t J1- Th — 7” + riers.
Similarly writing (17) in its simplest form for 7, in terms of
Ti9.3 T1500 a0d 755, We must have
Tas + 155 + 1351 + 2p a1 10 &lt;1 . (19)
and therefore, if 7,, and r,;, are given, 7,,; must lie between
the limits
~ Tipglise WT- Ths — Tigo + TioaTis. 0
The following table gives the limits of the third coefficient in
a few special cases, for the three coefficients of zero order and
of the first order respectively :(—
Value of Limits of
712 OT 712.3. | 713 OT 713.2. 723. 723.1.
0 - 0 Fl +1
+1 == +1 | =
x1 Ti = TI +1
+4/0°5 +1/0°5 0 LIER)
+705 FA/0'5 00-1 0, +1

“Hy
        <pb n="277" />
        XIL—PARTIAL CORRELATION. l
The student should notice that the set of three coefficients of
order zero and value unity are only consistent if either one only,
or all three, are positive, z.e. +1, +1, +1,0r = 1, —=1, +1; but
not —1, —1, —1. On the other hand, the set of three coefficients
of the first order and value unity are only consistent if one only,
or all three, are negative: the only consistent sets are +1, +1,
—land —1, —1, —1. The values of the two given 7’s need to
be very high if even the sign of the third can be inferred ; if the
two are equal, they must be at least equal to 4/05 or *707 . . . .
Finally, it may be noted that no two values for the known
coefficients ever permit an inference of the value zero for the
third ; the fact that 1 and 2, 1 and 3 are uncorrelated, pair and
pair, permits no inference of any kind as to the correlation
between 2 and 3, which may lie anywhere between +1 and — 1,
19. We do not think it necessary to add to this chapter a
detailed discussion of the nature of fallacies on which the theory
of multiple correlation throws much light. The general nature of
such fallacies is the same as for the case of attributes, and was
discussed fully in Chap. IV. §§ 1-8. It suffices to point out the
principal sources of fallacy which are suggested at once by the
form of the partial correlation
7, =n eee °
a eT) &amp;
and from the form of the corresponding expression for r, in terms
of the partial coefficients
Premills b TTR, b
li (1 = risa)(1 = 731) 2)
From the form of the numerator of (a) it is evident (1) that even
if 7, be zero, ry,, will not be zero unless either 7,5 or 7,, or
both, are zero. If 7; and 7,, are of the same sign the partial
correlation will be negative ; if of opposite sign, positive. = Thus
the quantity of a crop might appear to be unaffected, say, by
the amount of rainfall during some period preceding harvest :
this might be due merely to a correlation between rain and low
temperature, the partial correlation between crop and rainfall
being positive and important. We may thus easily misinterpret
a coefficient of correlation which is zero. (2) 7,55 may be, indeed
often is, of opposite sign to 7, and this may lead to still more
serious errors of interpretation,
From the form of the numerator of (5), on the other hand, we
see that, conversely, r,, will not be zero even though 7, , is zero,
unless either r,, or ry, is zero. This corresponds to the theorem

925°
        <pb n="278" />
        THEORY OF STATISTICS.
of Chap. IV. § 6, and indicates a source of fallacies similar to
those there discussed.

20. We have seen (§ 9) that r,, is the correlation between x, ,
and x,,, and that we might determine the value of this partial
correlation by drawing up the actual correlation table for the two
residuals in question. Suppose, however, that instead of drawing
up a single table we drew up a series of tables for values of x,
and z,, associated with values of x, lying within successive
class-intervals of its range. In general the value of r,,, would
not be the same (or approximately the same) for all such tables,
but would exhibit some systematic change as the value of x,
increased. Hence 7, should be regarded, in general, as of the
nature of an average correlation: the cases in which it measures
the correlation between x,, and x,, for every value of x, (cf.
Chap. XVI.) are probably exceptional. The process for deter-
mining partial associations (¢f. Chap. IV.) is, it will be remembered,
thorough and complete, as we always obtain the actual tables
exhibiting the association between, say, 4 and B in the universe
of C’s and the universe of y's: that these two associations may
differ materially, is illustrated by Example i. of Chap. IV.
(pp. 45-6). It might sometimes serve as a useful check on
partial-correlation work to reclassify the observations by the
fundamental methods of that chapter. For the general case an
extension of the method of the “ correlation-ratio ” (Chap. X., § 20)
might be useful, though exceedingly laborious. It is actually
employed in the paper cited in ref. 7 and the theory more fully
developed in ref. 8.

REFERENCES.

The preceding chapter is written from the standpoint of refs. 3 and 4, and with the
notation and method of ref. 5. The theory of correlation for several variables was
developed by Edgeworth and Pearson (refs. 1 and 2) from the standpoint of the normal ”
distribution of frequency (cf. Chap. XVL.).

Theory.

(1) EpGEWoRTH, F. Y., “On Correlated Averages,” Phil. Mag., 5th Series, vol. xXxiv.,
1892, p. 194.

(2) PEARSON, KARL, ‘ Regression, Heredity, and Panmixia,” Phil. Trans. Roy. Soc.,
Series A, vol. clxxxvii., 1896, p. 253.

(3) YULE, G. U., *“On the Significance of Bravais’ Formulae for Regression, etc., in the
case of Skew Correlation,” Proc. Roy. Soc., vol. Ix., 1897, p. 477.

(4) YULE, G. U., “On the Theory of Correlation,” Jour. Roy. Stat. Soc., vol. 1x., 1897,
p. 812.

(5) YULE, G. U., “On the Theory of Correlation for any number of Variables treated
by a New System of Notation,” Proc. Roy. Soc., Series A, vol. 1xxix., 1907, p. 182.

(6) HOOKER, R. H., and G. U. YULE, ‘Note on Estimating the Relative Influence of
Two Variables upon a Third,” Jour. Roy. Stat. Soc., vol. Ixix., 1906, p. 197.

(7) BROWN, J. W., M. GREENWOOD, and FRANCES Woob, “A Study of Index-Corre-
lations,” Jour. Roy. Stat. Soc., vol. 1xxvii., 1614, pp. 317-46. (The partial or
“solid ” correlation-ratio is used.) : " ;

(8) ISSERLIS, L., “On the Partial Correlation-Ratio, Pt. I. Theoretical,” Biometrika,
vol. x., 1914, pp. 391-411.

259
        <pb n="279" />
        XIL.—PARTIAL CORRELATION.
Illustrative Applications of Economic Interest,
(9) YULE, G. U., “ An Investigation into the Causes of Changes in Pauperism in England,
ete.,” Jour. Roy. Stat. Soc., vol. Ixii., 1899, p. 249.
(10) HOOKER, R. H., ‘The Correlation of the Weather and the Crops,” Jour. Roy. Stat.
Soc., vol. Ixx., 1907, p. 1.
(11) SNOW, E. C.. “The Application of the Method of Multiple Correlation to the Estima-
tion of Post-censal Populations,” Jour. Roy. Stat. Soc., vol. 1xxiv., 1911, p. 575.
EXERCISES.
1. (Ref. 10.) The following means, standard-deviations, and correlations are
found for
X, =seed-hay crops in cwts. per acre,
X, = spring rainfall in inches,
X;=accumulated temperature above 42° F. in spring,
in a certain district of England during 20 years.
M,=2802 op =4"42 T19= +080
M,= 4-91 a,=1"10 ri3= —-040
M;=594 o3=385 Tog = -0-56
Find the partial correlations and the regression-equation for hay-crop on spring
rainfall and accumulated temperature,

2. (The following figures must be taken as an illustration only : the data
on which they were based do not refer to uniform times or areas.)

X, =deaths of infants under 1 year per 1000 births in same year (infantile

mortality).

X,=proportion per thousand of married women occupied for gain,

Xy=death-rate of persons over 5 years of age per 10,000.

X,=proportion per thousand of population living 2 or more to a room

(overcrowding).

Taking the figures below for 30 urban areas in England and Wales, find the
partial correlations and the regression-equation for infantile mortality on the
other factors.

M, =164 n= 20°0 T= + 0°49 Tos= +015

My= 158 09= 74 ‘9 T3= +78 Toy= -037

3=M3  gy= ld ry =4020 r= +023
1=205 os=1300

3. If all the correlations of order zero are equal, say =r, what are the values
of the partial correlations of successive orders ?

Under the same condition, what is the limiting value of » if all the equal
correlations are negative and n variables have been observed ?

4. What is the correlation between z, ,and I

5. Write down from inspection the values of the partial correlations for the
three variables

x), Xy, and Xy=0a.X, +0. X,.

Check the answer to Qu. 7, Chap. XI, by working out the partial
correlations.

6. If the relation

az, +b.xy+exy=0

holds for all sets of values of z;, ,, and x, what must the partial correlations
be?

Check the answer to Qu. 9, Chap. XI, by working out the partial
correlations.

253
        <pb n="280" />
        PART III._THEORY OF SAMPLING.
CHAPTER XIII
SIMPLE SAMPLING OF ATTRIBUTES.

1. The problem of the present Part—2. The two chief divisions of the theory
of sampling—3. Limitation of the discussion to the case of simple
sampling—4. Definition of the chance of success or failure of a given
event— 5. Determination of the mean and standard-deviation of the
number of successes in n events—6. The same for the proportion of
successes in n events: the standard-deviation of simple sampling as a
measure of unreliability, or its reciprocal as a measure of precision—7.
Verification of the theoretical results by experiment—8. More detailed
discussion of the assumptions on which the formula for the standard-
deviation of simple sampling is based—9-10. Biological cases to
which the theory is directly applicable—11. Standard-deviation of
simple sampling when the numbers of observations in the samples
vary—12. Approximate value of the standard-deviation of simple
sampling, and relation between mean and standard-deviation, when
the chance of success or failure is very small—13. Use of the standard-
deviation of simple sampling, or standard error, for checking and
controlling the interpretation of statistical results.

1. ON several occasions in the preceding chapters it has been

pointed out that small differences between statistical measures like

percentages, averages, measures of dispersion and so forth cannot
in general be assumed to indicate the action of definite and assign-
able causes. Small differences may easily arise from indefinite
and highly complex causation such as determines the fluctuating
proportions of heads and tails in tossing a coin, of black balls in
drawing samples from a bag containing a mixture of black and
white balls, or of cards bearing measurements within some given
class-interval in drawing cards, say, from an anthropometric record.

In 100 throws of a coin, for example, we may have noted 56 heads

and only 44 tails, but we cannot conclude that the coin is biassed :

on repeating our throws we may get only 48 heads and 52 tails.

Similarly, if on measuring the statures of 1000 men in each of

two nations we find that the mean stature is slightly greater for
IB 4
        <pb n="281" />
        XIII.—SIMPLE SAMPLING OF ATTRIBUTES. )
nation 4 than for nation B, we cannot necessarily conclude that
the real mean stature is greater in the case of nation 4 : possibly
if the observations were repeated on different samples of 1000
men the ratio might be reversed.

2. The theory of such fluctuations may be termed the theory
of sampling, and there are two chief sections of the theory corre-
sponding to the theory of attributes and the theory of variables
respectively. In tossing a coin we only classify the results of the
tosses as heads or tails; in drawing balls from a mixture of black
and white balls, we only classify the balls drawn as black or as
white. These cases correspond to the theory of attributes, and
the general case may be represented as the drawing of a sample
from a universe containing both 4’s and o’s, the number or
proportion of 4’s in successive samples being observed. If, on the
other hand, we put in a bag a number of cards bearing different
values of some variable X and draw sample batches of cards, we
can form averages and measures of dispersion for the successive
batches, and these averages and measures of dispersion will vary
slightly from one batch to another. If associated measures of
two variables X and Y are recorded on each card, we can also form
correlation-coefficients for the different batches, and these will vary
in a similar manner. These cases correspond to the theory of
variables, and it is the function of the theory of sampling for such
cases to inform us as to the fluctuations to be expected in the
averages, measures of dispersion, correlation-coefficients, ete, in
successive samples. In the present and the three following
chapters the theory of sampling is dealt with for the case of
attributes alone. The theory is of great importance and interest,
not only from its applications to the checking and control of
statistical results, but also from the theoretical forms of frequency-
distribution to which it leads. Finally, in Chapter XVII. one or
two of the more important cases of the theory of sampling for
variables are briefly treated, the greater part of the theory, owing
to its difficulty, lying somewhat outside the limits of this work.

3. The theory of sampling attains its greatest simplicity if
every observation contributed to the sample may be regarded as
independent of every other. This condition of independence
holds good, e.g., for the tossing of a coin or the throwing of a die :
the result of any one throw or toss does not affect, and is un-
affected by, the results of the preceding and following tosses.
It does not hold good, on the other hand, for the drawing of balls
from a bag: if a ball be drawn from a bag containing 3 black
and 3 white balls, the remainder may be either 2 black and 3
white, or 2 white and 3 black, according as the first ball was
black or white. The result of drawing a second ball is therefore

25F
        <pb n="282" />
        &gt; THEORY OF STATISTICS.

dependent on the result of drawing the first. The disturbance
can only be eliminated by drawing from a bag containing a
number of balls that is infinitely large compared with the
total number drawn, or by returning each ball to the bag before
drawing the next. In this chapter our attention will be confined
to the case of independent sampling, as in coin-tossing or dice-
throwing—the simplest cases of an artificial kind suitable for
theoretical study and experimental verification. For brevity, we
may refer to such cases of sampling as simple sampling: the
implied conditions are discussed more fully in § 8 below.

4. If we may regard an ideal coin as a uniform, homogeneous
circular disc, there is nothing which can make it tend to fall more
often on the one side than on the other; we may expect, there-
fore, that in any long series of throws the coin will fall with
either face uppermost an approximately equal number of times,
or with, say, heads uppermost approximately half the times.
Similarly, if we may regard the ideal die as a perfect homogeneous
cube, it will tend, in any long series of throws, to fall with each
of its six faces uppermost an approximately equal number of
times, or with any given face uppermost one-sixth of the whole
number of times. These results are sometimes expressed by
saying that the chance of throwing heads (or tails) with a coin is
1/2, and the chance of throwing six (or any other face) with a die
is 1/6. To avoid speaking of such particular instances as coins
or dice, we shall in future, using terms which have become
conventional, refer to an event the chance of success of which is p
and the chance of failure ¢. Obviously p+¢=1.

5. Suppose we take IV samples with » events in each. What
will be the values towards which the mean and standard-deviation
of the number of successes in a sample will tend? The mean is
given at once, for there are N.n events, of which approximately
pNn will be successes, and the mean number of successes in a
sample will therefore tend towards pn. As regards the standard-
deviation, consider first the single event (n=1). The single
event may give either no successes or one success, and will tend
to give the former gk, the latter pXV, times in XN trials. Take
this frequency-distribution and work out the standard-deviation
of the number of successes for the single event, as in the case of
an arithmetical example :—

Frequency f. Successes &amp;. J
2 g
rN ' ]
= y

256
AN
        <pb n="283" />
        XIIL.—SIMPLE SAMPLING OF ATTRIBUTES. RN

We have therefore M =p, and
oi=p-p*=pq.
But the number of successes in a group of # such events is the
sum of successes for the single events of which it is composed,
and, all the events being independent, we have therefore, by the
usual rule for the standard-deviation of the sam of independent
variables (Chap. XI. § 2, equation (?)), o, being the standard-
deviation of the number of successes in z events,
oa=npq . : : L(Y)

This is an equation of fundamental importance in the theory of
sampling. The student should particularly bear in mind that
the standard-deviation of the number of successes, due to
fluctuations of simple sampling alone, in a group of mn events
varies, not directly as n, but as the square root of n.

6. In lieu of recording the absolute number of successes in each
sample of n events, we might have recorded the proportion of
such successes, 7.e. 1/nth of the number in each sample. As this
would amount to merely dividing all the figures of the original
record by 7, the mean proportion of successes—or rather the value
towards which the mean tends to approach—must be p, and the
standard-deviation of the proportion of successes s, be given by

S=cifnt=pgin . . . . (2

The standard-deviation of the proportion of successes in samples
of such independent events varies therefore inversely as the square
root of the number on which the proportion is calculated. Now
if we regard the observed proportion in any one sample as a
more or less unreliable determination of the true proportion in
a very large sample from the same material, the standard-devia-
tion of sampling may fairly be taken as a measure of the
unreliability of the determination—the greater the standard-
deviation, the greater the fluctuations of the observed proportion,
although the true proportion is the same throughout. The
reciprocal of the standard-deviation (1/s), on the other hand, may
be regarded as a measure of reliability, or, as it is sometimes
termed, precision, and consequently the reliability or precision of
an observed proportion varies as the square root of the number of
observations on which it is based. This is again a very important
rule with many practical applications, but the limitations of the
case to which it applies, and the exact conditions from which it
has been deduced, should be borne in mind. We return to this
point again below (§ 8 and Chap. XIV.).

7. Experiments in coin tossing, dice throwing, and so forth
have been carried out by various persons in order to obtain ex-

ne

257
        <pb n="284" />
        - THEORY OF STATISTICS.

perimental verification of these results. The following will serve
as illustrations, but the student is strongly recommended to
carry out a few series of such experiments personally, in order to
acquire confidence in the use of the theory. It may be as well
to remark that if ordinary commercial dice are to be used for the
trials, care should be taken to see that they are fairly true cubes,
and the marks not cut very deeply. Cheap dice are generally
very much out of truth, and if the marks are deeply cut the
balance of the die may be sensibly affected. A convenient mode
of throwing a number of dice, suggested, we believe, by the late
Professor Weldon, is to roll them down an inclined gutter of
corrugated paper, so that they roll across the corrugations.

(1) (W. F. R. Weldon, cited by Professor F. Y. Edgeworth,
Encycl. Brit., 11th edn., vol. xxii. p. 394. Totals of the columns
in the table there given.)

Twelve dice were thrown 4096 times ; a throw of 4, 5, or 6 points
reckoned a success, therefore p=¢=0'5. Theoretical mean //=6 ;
Pe value of the standard-deviation oj, = #/05 x 0:5 x 12 =
1-732.

The following was the frequency-distribution observed :—

Successes. Frequency. ! Successes. Frequency.
0 Ro 7 847
7 3 536
- 60 9 257
v 198 10 71
430 11 11
ae 731 12 —_
b a Total 4096
Mean M = 6-139, standard-deviation ¢=1-712. The proportion of
successes is 6:139/12=0512 instead of 0-5.

(2) (W. F. R. Weldon, loc. cit., p. 400. Totals of columns of
the table given.)

Twelve dice were thrown 4096 times; only a throw of 6 was
counted a success, so p=1/6, ¢=&gt;5/6. Theoretical mean M=2,
standard-deviation o = /1/6 x 5/6 x 12 =1-291.

The following was the observed frequency-distribution :—

Successes. Frequency. Successes. Frequency.
0 447 5 115
1 1145 6 24
a 1181 ( 7
J 796 8 1
: 380 Total 4096

258
        <pb n="285" />
        XIIL.—SIMPLE SAMPLING OF ATTRIBUTES. 259
Mean M = 2-000, standard-deviation c=1296. Actual proportion
of successes 2:00/12 =0-1667, agreeing with the theoretical value
to the fourth place of decimals. Of course such very close
agreement is accidental, and not to be always expected.

(3) (G. U. Yule.) The following may be taken as an illustra-
tion based on a smaller number of observations. Three dice were
thrown 648 times, and the numbers of 5s or 6’s noted at
each throw. p=1/3, ¢=2/3. Theoretical mean 1. Standard-
deviation, 0-816.

Frequency-distribution observed :—

Successes, Frequency.
“ 179
i 298
2 141
o 30
Total 648
M=1'034, 0=0823. Actual proportion of successes 0:345.

For other illustrations, some of which are cited in the questions
at the end of this chapter, the student may be referred to the
list of references on p. 273. The student should notice that in
all the distributions given a range of six times the standard-
deviation includes either all, or the great bulk of, the observations,
as in most frequency-distributions of the same general form. We
shall make use of this rule below, § 13.

8. In deducing the formule (1) and (2) for the standard-
deviations of simple sampling in the cases with which we have
been dealing, only one condition has been explicitly laid down as
necessary, viz. the independence of the several drawings, tossings,
or other events composing the sample. But in point of fact this
is not the only nor the most fundamental condition which has
been explicitly or implicitly assumed, and it is necessary to realise
all the conditions in order to grasp the limitations under which
alone the formule arrived at will hold. Supposing, for example,
that we observe among groups of 1000 persons, at different times
or in different localities, various percentages of individuals
possessing certain characteristics —dark hair, or blindness, or
insanity, and so forth. Under what conditions should we
expect the observed percentages to obey the law of sampling
that we have found, and show a standard-deviation given by
equation (2)?

(a) In the first place we have tacitly assumed throughout the
preceding work that our dice or our coins were the same set or
        <pb n="286" />
        £ THEORY OF STATISTICS.

identically similar throughout the experiment, so that the chance
of throwing “heads” with the coins or, say, “six” with the dice
was the same throughout: we did not commence an experiment
with dice loaded in one way and later on take a fresh set of dice
loaded in another way. Consequently if formula (2) is to hold
good in our practical case of sampling there must not be a
difference in any essential respect—1.e. in any character that can
affect the proportion observed—between the localities from which
the observations are drawn, nor, if the observations have been
made at different epochs, must any essential change have taken
place during the period over which the observations are spread.
Where the causation of the character observed is more or less
unknown, it may, of course, be difficult or impossible to say what
differences or changes are to be regarded as essential, but, where
we have more knowledge, the condition laid down enables us to
exclude certain cases at once from the possible applications of
formula (1) or (2). Thus it is obvious that the theory of simple
sampling cannot apply to the variations of the death-rate in
localities with populations of different age and sex compositions,
nor to death-rates in a mixture of healthy and unhealthy districts,
nor to death-rates in successive years during a period of con-
tinuously improving sanitation. In all such cases variations
due to definite causes are superposed on the fluctuations of
sampling.

(6) In the second place, we have also tacitly assumed not
only that we were using the same set of coins or dice throughout,
so that the chances p and ¢ were the same at every trial, but
also that all the coins and dice in the set used were identically
similar, so that the chances p and ¢ were the same for every coin
or die. Consequently, if our formule are to apply in the practical
case of sampling, the conditions that regulate the appearance of
the character observed must not only be the same for every
sample, but also for every individual in every sample. This is
again a very marked limitation. To revert to the case of death-
rates, formule (1) and (2) would not apply to the numbers of
persons dying in a series of samples of 1000 persons, even if these
samples were all of the same age and sex composition, and living
under the same sanitary conditions, unless, further, each sample
only contained persons of one sex and one age. For if each
sample included persons of both sexes and different ages, the
condition would be broken, the chance of death during a given
period not being the same for the two sexes, nor for the young
and the old. The groups would not be homogeneous in the sense
required by the conditions from which our formule have been
deduced. Similarly, if we were observing hair-colours, our formule

7260
        <pb n="287" />
        XIIL.—SIMPLE SAMPLING OF ATTRIBUTES. 245]
would not apply if the samples were compounded by always
taking one person from district 4, another from district B, and
so on, these districts not being similar as regards the distribution
of hair-colour.

The above conditions were only tacitly assumed in our previous
work, and consequently it has been necessary to emphasise them
specially. The third condition was explicitly stated: (c) The
individual *‘events,” or appearances of the character observed,
must be completely independent of one another, like the throws
of a die, or sensibly so, like the drawings of balls from a bag
containing a number of balls that is very large compared with
the number drawn. Reverting to the illustration of a death-rate,
our formule would not apply even if the sample populations
were composed of persons of one age and one sex, if we were
dealing, for example, with deaths from an infectious or contagious
disease. For if one person in a certain sample has contracted
the disease in question, he has increased the possibility of others
doing so, and hence of dying from the disease. The same thing
holds good for certain classes of deaths from accident, e.g. railway
accidents due to derailment, and explosions in mines: if such an
accident is fatal to one person it is probably fatal to others also,
and consequently the annual returns show large and more or
less erratic variations.

When we speak of simple sampling in the following pages, the
term is intended to imply the fulfilment of all the conditions (a),
(8), and (ec), all the samples and all the individual contributions to
each sample being taken under precisely the same conditions,
and the individual “events” or appearances of the character being
quite independent. It may be as well expressly to note that we
need not make any assumption as to the conditions that determine
p unless we have to estimate i/mpg a priori. If we draw a
sample and observe in it the actual proportion of, say, 4’s:
draw another sample under precisely the same conditions, and
observe the proportion of 4’s in the two samples together: add
to these a third sample, and so on, we will find that p approaches
—not continuously, but with some fluctuations—closer and closer
to some limiting value. Tt is this limiting value which is to be
used in our formulee—the value of » that would be observed in
a very large sample. The standard-deviation of the number of
sixes thrown with » dice, on this understanding, may be «/npg,
even if the dice be out of truth or loaded so that pis no longer
1/6. Similarly, the standard-deviation of the number of black
balls in samples of » drawn from an infinitely large mixture of
black and white balls in equal proportions may be «npg even

at
        <pb n="288" />
        : THEORY OF STATISTICS.

if p is, say, 1/3, and not 1/2 owing to the black balls, for some
os tending to slip through our fingers. (Cf. Chap. XIV.
S 4.

9. It is evident that these conditions very much limit the
field of practical cases of an economic or sociological character
to which formule (1) and (2) can apply without considerable
modification. The formule appear, however, to hold to a high
degree of approximation in certain biological cases, notably in
the proportions of offspring of different types obtained on crossing
hybrids, and, with some limitations, to the proportions of the
two sexes at birth. It is possible, accordingly, that in these cases
all the necessary conditions are fulfilled, but this is not a necessary
inference from the mere applicability of the formule (cf. Chap.
XIV. § 15). In the case of the sex-ratio at birth, it seems
doubtful whether the rule applies to the frequency of the sexes in
individual families of given numbers (ref. 9), but it does apply
fairly closely to the sex-ratios of births in different localities,
and still more closely to the ratios in one locality during
successive periods. That is to say, if we note the number of
males in a series of groups of » births each, the standard-deviation
of that number is approximately a/mpg, where p is the chance
of a male birth; or, otherwise, a/pg/n is the standard-deviation
of the proportion of male births. We are not able to assign an
a priors value to the chance p as in the case of dice-throwing,
but it is quite sufficiently accurate for practical purposes to use
the proportion of male births actually observed if that proportion
be based on a moderately large number of observations.

10. In Table VI. of Chap. IX. (p. 163) was given a correlation-
table between the total numbers of births in the registrationdistricts
of England and Wales during the decade 1881-90 and the pro-
portion of male births. The table below gives some similar figures,
based on the same data, for a few isolated groups of districts con-
taining not less than 30 to 40 districts each. In both tables the
drop in dispersion as we pass from the small to the large districts
is extremely striking. The actual standard-deviations, and the
standard-deviations of simple sampling corresponding to the mid-
numbers of births, are given at the foot of the table, and it will
be seen that the two agree, on the whole, with surprising closeness,
considering the small numbers of observations. The actual
standard-deviation is, however, the larger of the two in every case
but one. The corresponding standard-deviations for Table VI. of
Chap. IX. are given in Qu. 7 at the end of this chapter, and show
the same general agreement with the standard-deviations of simple
sampling ; the actual standard-deviations are, however, again, as
a rule, slightly in excess of the theoretical values.

269
        <pb n="289" />
        XIII.—SIMPLE SAMPLING OF ATTRIBUTES.

TABLE showing Frequencies of Registration Districts in England and Wales with
Different Ratios of Male to Total Births during the Decade 1881-90, for
Groups of Districts with the Numbers of Births in the Decade lying between
Certain Limits. [Data based on Decennial Supplement to Fifty-fifth Annual
Report of the Registrar-General for England and Wales.)

Number of Births in Decade.
Male Births :
per Thousand 155, 3500 | 4500 (10,000 15,000 | 30,000 | 50,000 {
Total Births, to to to to | to to | to |
2500. ' 4000. ! 5000. '15,000. 20,000.| 50,000. 90,000.
466-67 : &gt; - oe _ =
482- 3 = et [ee — =
492- 3 -- = = = —_
494-5 : ER = —_
496- 7 = 5 : —
498- ¢ —
500-1 2 —
502- ¢ -
504-5 l fi
506-7 lu
508- 9 : ie 1
510-1 ¢ :
512- 3
514-5 —
516- 7 J
518-9 -
520-1 so
522-7 :
524-1 .
526- 7 }
528-9 ox
530-1
532- 3 _
534-5
536- 7
Total 36 FR : 52 35
Mean 5082 5 °5 | : 5 '= 0 |5078 |
Standard deviations 12:8 8:53 I: | e22| 220]
Theo. st. deviation
corresponding to} 11-2  8'16 iF 2:50 | 1-89
mean births s;
Na 6:2055 D 3iES1-1
* The meaning of this expression is explained in § 10 of Chap. XIV,

263
        <pb n="290" />
        264 THEORY OF STATISTICS.

The student should note that in both cases the standard-devia-
tions given are standard-deviations of the proportion of male
births per 1000 of all births, that is, 1000 times the values given
by equation (2). These values are given by simply substituting
the proportions per 1000 for p and ¢ in the formula. Thus for
the first column of Table I. the proportion of males is 508 per
1000 births, the mid-number of births 2000, and therefore—

508 x 492\}
w=("go00) =112

11. In the above illustration the difficulty due to the wide
variation in the number of births in different districts has been
surmounted by grouping these districts in limited class intervals,
and assuming that it would be sufficiently accurate for practical
purposes to treat all the districts in one class as if the sex-ratios
had been based on the mid-numbers of births. Given a sufficiently
large number of observations, such a process does well enough,
though it is not very good. But if the number of observations
does not exceed, perhaps, 50 or 60 altogether, grouping is
obviously out of the question, and some other procedure must be
adopted.

Suppose, then, that a series of samples have been taken from
the same material, /; samples containing n, individuals or observa-
tions each, f, containing n,, Js containing nm, and so on: What
would be the standard-deviation of the observed proportions in
these samples! Evidently the square of the standard-deviation
in the first group would be pq/ny, in the second pg/n,, and so on:
therefore, as the means tend to the same values in all the groups,
we must have for the whole series—

Fmpg(D4 Ler lay :
7 n,n,
But if H be the harmonic mean of ny, Ny By
eh Bale
Zn, nyt,
and accordingly
g=t0, (5)
That is to say, where the number of observations varies from one
sample to another, the harmonic mean number of observations in
a sample must be substituted for n in equation (2).
Thus the following percentages (taken to the nearest unit) of
        <pb n="291" />
        XIIL—SIMPLE SAMPLING OF ATTRIBUTES. )
albinos were obtained in 121 litters from hybrids of Japanese
waltzing mice by albinos, crossed inter se (A. D. Darbishire,
Biometrika, iii. p. 30): —

Percentage. Frequency. Percentage. Frequency.
L 40 2

LE 43 2

WI 50 16

20 oi

22 60

25 - ul

29 &amp;0

33 100
The distribution is very irregular owing to the small numbers in
the litters, and the standard-deviation is 23-09 per cent. The
numbers of litters of different sizes were given in § 27 of Chap.
VII. p. 128, and the harmonic mean size of litter was found to be
3:03. The expected proportion of albinos is 25 per cent., and
hence the standard-deviation of sampling is

25 x T5\!
(S55) =205

in very close agreement with the actual value. The proportion
of albinos amongst all the offspring together was 24-7 per cent.

12. If one of the two proportions » and ¢ become very small,
equation (1) may be put into an approximate form that is very
useful. Suppose p to be the proportion that becomes very small,
so that we may neglect p2 compared with p: then

Pq =p —p*=p approximately,
and consequently we have approximately
on= nnp= JM (4)

That is to say, if the proportion of successes be small, the
standard-deviation of the number of successes is the square root of
the mean number of successes. Hence we can find the standard-
deviation of sampling even though p be unknown, provided only
we know that it is small.

Thus (ref. 15) in 10 Prussian army corps in 20 years (1875-
1894) there were 122 men killed by the kick of a horse, or, on an
average, there were 0-61 deaths from that cause in each army
corps annually. From equation (4) we accordingly have for the
standard-deviation of simple sampling

o=(061)!=0178,

26%
        <pb n="292" />
        THEORY OF STATISTICS.
The frequency-distribution of the number of deaths per army
corps per annum was
Deaths, Frequency.
0 109
1 65
2 22
2 3
‘ 1
whence
o2=0'6079
o=078
—an almost exact agreement with the standard-deviation of simple
sampling.
13. We may now turn from these verifications of the theoretical
results for various special cases, to the use of the formule for
checking and controlling the interpretation of statistical results.
If we observe, in a statistical sample, a certain proportion of
objects or individuals possessing some given character—say A’'s—
this proportion differing more or less from the proportion which
for some reason we expected, the question always arises whether
the difference may be due to the fluctuations of simple sampling
only, or may be indicative of definite differences between the
conditions in the universe from which the sample has been drawn
and the assumed conditions on which we based our expectation.
Similarly, if we observe a different proportion in one sample from
that which we have observed in another, the question again arises
whether this difference may be due to fluctuations of simple
sampling alone, or whether it indicates a difference between the
conditions subsisting in the universes from which the two samples
were drawn : in the latter case the difference is often said to be
significant. These questions can be answered, though only more
or less roughly at present, by comparing the observed difference
with the standard-deviation of simple sampling. We know
roughly that the great bulk at least of the fluctuations of samp-
ling lie within a range of + three times the standard-deviation ;
and if an observed difference from a theoretical result greatly
exceeds these limits it cannot be ascribed to a fluctuation of
“simple sampling ” as defined in § 8: it may therefore be signifi-
cant. The “standard-deviation of simple sampling” being the
basis of all such work, it is convenient to refer to it by a shorter
name. The observed proportions of A’s in given samples being
regarded as differing by larger or smaller errors from the true
proportion in a very large sample from the same material, the

266
        <pb n="293" />
        XIII,—SIMPLE SAMPLING OF ATTRIBUTES. Co
“standard-deviation of simple sampling” may be regarded as a
measure of the magnitude of such errors, and may be called ac-
cordingly the standard error,

Three principal cases of comparison may be distinguished.

Case I.—I¢t is desired to know whether the deviation of a certain
observed number or proportion from an expected theoretical value
is possibly due to errors of sampling.

In this case the observed difference is to be compared with the
standard error of the theoretical number or proportion, for the
number of observations contained in the sample.

Example i.—In the first illustration of § 7, 25,145 throws of a 4,
5, or 6 were made in lieu of the 24,576 expected (out of 49,152
throws altogether). The excess is 569 throws. Is this excess
possibly due to mere fluctuations of sampling ?

The standard error is

o= 3x4 x49152
= 1109,

The deviation observed is 5°1 times the standard error, and,
practically speaking, could not occur as a fluctuation of simple
sampling. It may perhaps indicate a slight bias in the dice.

The problem might, of course, have been attacked equally well
from the standpoint of the proportion in lieu of the absolute
number of 4’s, 5s, or 6’s thrown. This proportion is 0-5116 instead
of the theoretical 05000, difference in excess 0:0116. The
standard error of the proportion is

L
sent X 4 X 49159 =0-00226,
and the difference observed bears the same ratio to the standard
error as before, as of course it must.

Example ii.—(Data from the Second Report of the Evolution
Committee of the Royal Society, 1905, p. 72.)

Certain crosses of Pisum sativum gave 5321 yellow and 1804
green seeds. The expectation is 25 per cent. of green seeds, or
1781. Can the divergence from the exact theoretical result have
arisen owing to errors of sampling only?

The numerical difference from the expected result is 23. The
standard error is

o= 025x075 x T125 = 36-8.
Hence the divergence from theory is only some 3/5 of the
standard error, and may very well have arisen owing simply to
fluctuations of sampling.

26"
        <pb n="294" />
        THEORY OF STATISTICS.

Working from the observed proportion of green seeds, viz. 0:2532

instead of the theoretical 0:25, we have

s= 7/025 x 0°75/7125 = 0:0051,
and similarly the divergence from theory is only some 3/5 of the
standard error, as before.

It should be noted that this method must not be used as a test
of association by comparing the difference of (4B) from (4)(B)/N
with a standard error calculated from the latter value as a
“theoretical number,” for it is not a theoretical number given
a prior: as in the above illustrations, and (4) and (B) are themselves
liable to errors of sampling. If we formed an association-table
between the results of tossing two coins XV times, o= ,/&amp;.}. 3
would be the standard error for the divergence of (4.8) from the
a preore value n/4, not the standard error for differences of (4.5)
from (4)(B)/N, (4) and (B) being the numbers of heads thrown
in the case of the first and the second coin respectively.

Case II.—Two samples from distinct materials or different
universes give proportions of A’s p, and p, the numbers of
observations in the samples being n, and =, respectively. (a) Can
the difference between the two proportions have arisen merely as a
fluctuation of simple sampling, the two universes being really
similar as regards the proportion of A’s therein? (8) If the
difference indicated were a real one, might it vanish, owing to
fluctuations of sampling, in other samples taken in precisely the
same way? This case corresponds to the testing of an association
which is indicated by a comparison of the proportion of 4’s amongst
B’s and fs.

(¢) We have no theoretical expectation in this case as to the
proportion of 4’s in the universe from which either sample has
been taken.

Let us find, however, whether the observed difference between p,
and p, may not have arisen solely as a fluctuation of simple
sampling, the proportion of 4’s being really the same in both cases,
and given, let us say, by the (weighted) mean proportion in our
two samples together, z.e. by

py ELT Poly
0" nm +m,
(the best guide that we have).
Let ¢, €, be the standard errors in the two samples, then
&amp; = PoQo/ My &amp; = PQo/ Ma

If the samples are simple samples in the sense of the previous

work, then the mean difference between p, and p, will be zero,

268
        <pb n="295" />
        XIII.—SIMPLE SAMPLING OF ATTRIBUTES. 9
and the standard error of the difference €5 the samples being
independent, will be given by

f cepa] Suid 5
€12 pa, +.) . . (9)

If the observed difference is less than some three times €, it
may have arisen as a fluctuation of simple sampling only.

(6) If, on the other hand, the proportions of 4’s are not the same
in the material from which the two samples are drawn, but », and
py are the true values of the proportions, the standard errors of
sampling in the two cases are

6 =pq/m € = Poe]
and consequently
=~ +P . : (6)
Lo Le

If the difference between p, and p, does not exceed some three
times this value of ¢,, it may be obliterated by an error of simple
sampling on taking fresh samples in the same way from the same
material.

Further, the student should note that the value of €, given by
equation (6) is frequently employed, in lieu of that given by
equation (5), for testing the significance of an observed difference,
The justification of this usage we indicate briefly later (Chap.
XIV, § 3). Here it is sufficient to state that, if » be large,
equation (6) gives approximately the standard-deviation of the
true values of the difference for a given observed value, and hence,
if the observed difference is greater than some three times
the value of ¢, given by (6), it is hardly possible that the true
value of the difference can be zero. The difference between the
values of , given by (5) and (6) is indeed, as a rule, of more
theoretical than practical importance, for they do not differ largely
unless p, and p, differ largely, and in that case either formula will
place the difference outside the range of fluctuations of sampling.

Example iii.—The following data were given in Qu. 3 of Chap.
ITI. for plants of Lobelia Julgens obtained by cross- and self-fertilisa-
tion respectively :—

Parentage Cross-fertilised., Parentage Self-fertilised,

Height— Height—

Above Average. Below Average, Above Average. Below Average.

17 1% 12 22

The figures indicate an association between tallness and cross-
fertilisation of parentage. Is this association significant of some
real difference, or may it have arisen solely as an “error of

26
        <pb n="296" />
        2vn THEORY OF STATISTICS.
sampling ”% The proportion of plants above average height in the
two classes (cross- and selffertilised) together is 29/68. The
standard-deviation of the differences due to simple sampling
between the proportions of * tall” plants in two samples of 34
observations each is therefore
29139 1 aN

c= 53 X 88 X 2) =0120,
or 12:0 per cent. The actual proportions observed are 50 per
cent. and 35 per cent.—difference 15 per cent. As this difference
is only slightly in excess of the standard error of the difference,
for samples of 34 observations drawn from identical material, no
definite significance could be attached to it—if it stood alone.

The student will notice, however, that all the other cases cited
from Darwin in the question referred to show an association of
the same sign, but rather more marked. Hence the difference
observed may be a real one, or perhaps the real difference may be
greater and may be partially masked by a fluctuation of sampling.
If 50 per cent. and 35 per cent. were the true proportions in the
two classes, the standard error of the percentage difference would
be, by equation (6),

3
ar XO) 11 pr on,
and consequently the actual difference might not infrequently be
completely masked by fluctuations of sampling, so long as experi-
ments were only conducted on the same small scale.

Example iv.—(Data from J. Gray, Memoir on the Pigmentation
Survey of Scotland, Jour. of the Royal Anthropological Institute,
vol. xxxvii., 1907.) The following are extracted from the tables
relating to hair-colour of girls at Edinburgh and Glasgow :—

Of Medium Total Per cent.

Hair-colour. observed. Medium.
Edinburgh . ; 4,008 9,743 41'1
Glasgow : : 17,529 39,764 44-1

Can the difference observed in the percentage of girls of medium
hair-colour have arisen solely through fluctuations of sampling *

In the two towns together the percentage of girls with medium
hair-colour is 43-5 per cent. If this were the true percentage,
the standard error of sampling for the difference between per-
centages observed in samples of the above sizes would be-—

1 ry
cia = (435 x 560) x (ge + 57)
=(0'56 per cent.

ed
        <pb n="297" />
        XIIL.—SIMPLE SAMPLING OF ATTRIBUTES. 271
The actual difference is 3-0 per cent., or over 5 times this, and
could not have arisen through the chances of simple sampling.

If we assume that the difference is a real one and calculate the
standard error by equation (6), we arrive at the same value, viz.
0-56 per cent. With such large samples the difference could not,
accordingly, be obliterated by the fluctuations of simple sampling
alone.

Case III.—Two samples are drawn from distinct material or
different universes, as in the last case, giving proportions of
4A’s p, and p,, but in lieu of comparing the proportion p, with
py it is compared with the proportion of 4’s in the two samples
together, viz. p,, where, as before,

2 TP tT np,
OC mtn
Required to find whether the difference between p, and p, can
have arisen as a fluctuation of simple sampling, p, being the
true proportion of 4’s in both samples.

This case corresponds to the testing of an association which
is indicated by a comparison of the proportion of 4’s amongst
the B’s with the proportion of A4’s in the universe. The general
treatment is similar to that of Case II., but the work is complicated
owing to the fact that errors in p, and p, are not independent.

If ¢, be the standard error of the difference between p, and
Py We have at once

a =6+6—2r,. qq
1 1 1
=pq r= rp !
Yn +m, on, Nn, +n,
Ta being the correlation between errors of simple sampling in
py and p,. But, from the above equation relating p, to Py
and p,, writing it in terms of deviations in p, p, and 2
multiplying by the deviation in p, and summing, we have,
since errors in p, and p, are uncorrelated,
2% q Ta
rte nin
17% 177%
Therefore finally
&amp; = Pd ny
Yimin @

Unless the difference between py and p, exceed, say, some
three times this value of ¢, it may have arisen solely by the
chances of simple sampling.
        <pb n="298" />
        2 - THEORY OF STATISTICS.

It will be observed that if n, be very small compared with
ny €,; approaches, as it should, the standard error for a sample
of n, observations.

We omit, in this case, the allied problem whether, if the
difference between p, and p, indicated by the samples were
real, it might be wiped out in other samples of the same size
by fluctuations of simple sampling alone. The solution is a
little complex as we no longer have &amp;=p.q,/(n; + ny).

Example v.—Taking the data of Example iii., suppose that
we compare the proportion of tall plants amongst the offspring
resulting from cross-fertilisations (viz. 50 per cent.) with the
proportion amongst all offspring (viz. 29/68, or 42:6 per cent.).
As, in this case, both the subsamples have the same number
of observations, n, =n,= 34, and

20 39 IN
n= 00x 1) ~ 0-060
or 6 per cent. Asin the working of Example iii., the observed dif-
ference is only 1°25 times the standard error of the difference, and
consequently it may have arisen as a mere fluctuation of sampling.

Example vi.—Taking now the figures of Example iv., suppose
that we had compared the proportion of girls of medium hair-
colour in Edinburgh with the proportion in Glasgow and
Edinburgh together. The former is 41'1 per cent. the latter
435 per cent., difference 24 per cent. The standard error of
the difference between the percentages observed in the sub-
sample of 9743 observations and the entire sample of 49,507
observations is therefore

Bin, nN 39,764 Y= ;
ep = (43'5 x 56:5) (ress — 0°45 per cent.
The actual difference is over five times this (the ratio must, of
course, be the same as in Example iv.), and could not have occurred
as a mere error of sampling.
REFERENCES:

The theory of sampling, for the cases dealt with in this chapter, is generally
treated by first determining the frequency-distribution of the number of
successes in a sample. This frequency-distribution is not considered till
Chapter XV., and the student will be unable to follow much of the literature
until he has read that chapter.

Experimental results of dice throwing, coin tossing, etc.

(1) QUETELET, A., Leltres . . . . sur la théorie des probabilités ; Bruxelles,
1846 (English translation by O. G. Downes; C. &amp; E. Layton, London,
1849). See especially letter xiv. and the table on p. 374 of the
French, p. 255 of the English, edition.

7
        <pb n="299" />
        XIII.—SIMPLE SAMPLING OF ATTRIBUTES. 273

(2) WesTERGAARD, H., Die Grundzige der Theorie der Statistik ; Fischer,
Jena, 1890.

(3) EpceworrH, F. Y., Article on the Law of Error” in the Tenth Edition
of the Encyclopedia Britannica, vol. xxviii., 1902, p. 280; or on
“‘ Probability,” Eleventh Edition, vol. xxii. (especially Part IIL,
Pp. 390 et seq. ).

(4) DARBISHIRE, A. D., ‘‘Some Tables for illustrating Statistical Correlation,”
Mem. and Proc. of the Manchester Lit. and Phil. Soc., vol. 1i., 1907.

General : and applications to sex-ratio of births.

(5) Porsson, 8. D., “Sur la proportion des naissances des filles et des
gargons,” Mémoires de Acad. des Sciences, vol. ix., 1829, p. 239.
(Principally theoretical : the statistical illustrations very slight.)

(6) Lexis, W., Zur Theorie der Massenerscheinungen in der menschlichen
Gesellschaft ; Freiburg, 1877.

(7) Lexis, W., dbhandlungen zur Theorie der Bevilkerungs und Moralstati-
stik ; Fischer, Jena, 1903. (Contains, with new matter, reprints of
some of Professor Lexis’ earlier papers in a form convenient for
reference. )

(8) EpceworTH, F.Y., ‘Methods of Statistics,” Jour. Roy. Stat. Soc.,
jubilee volume, 1885, p. 181.

(9) VeNN, JouN, The Logic of Chance, 3rd edn. ; Macmillan, London, 1888.
(Cf. the data regarding the distribution of sexes in families on p. 264,
to which reference was made in § 9.)

(10) Pearson, KARL, ‘Skew Variation in Homogeneous Material,” Phil.
Trans. Roy. Soc., Series A, vol. clxxxvi., 1895, p. 343. (Sections 2 to
6 on the binomial distribution.)

(11) EpceEworTH, F. Y., ‘“ Miscellaneous Applications of the Calculus of
Probabilities,” Jour. Roy. Stat. Soec., vols. Ix., Ixi., 1897-8 (especially
part ii., vol. Ixi. p. 119).

(12) Vicor, H. D., and G. U. YULE, “On the Sex-ratios of Births in the
Registration Districts of England and Wales, 1881-90,” Jour. Roy.
Stat. Soc., vol. Ixix., 1906, p. 576. (Use of the harmonic mean as in
§ 11.)

As regards the sex-ratio, reference may also be made to papers in
vols. v. and vi. of Biometrika by Heron, Weldon, and Woods.

(13) Yur, G. U., “Fluctuations of Sampling in Mendelian Ratios,” Proc.
Camb. Phil. Soc., vol. xvii., 1914, p. 425.

The law of small chances (§ 12).

(14) Poisson, S. D., Recherches sur la probabilité des jugements, ete. ; Paris,
1837. (Pp. 205-7.)

(15) BORTKEWITSCH, L. VON, Das Geseiz der kleinen Zahlen ; Teubner,
Leipzig, 1898.

(16) STUDENT, ““On the Error of Counting with a Hemacytometer,” Bio-
metrika, vol. v. p. 851, 1907.

(17) RurHERFORD, E., and H. GEIGER, with a note by H. BATEMAN,
*‘ The probability variations in the distribution of a particles,” Phil.
Mag., Series 6, vol. xx., 1910, p. 698. (The frequency of particles
emitted during a small interval of time follows the law of small
chances: the law deduced by Bateman in ignorance of previous work.)

(18) Sorkr, H. E., “Tables of Poisson’s Exponential Binomial Limit,” Bio-
metrika, vol. x., 1914, pp. 25-35.

(19) WHITAKER, Lucy, “On Poisson’s Liaw of Small N umbers,” Biometrika,
vol. x., 1914, pp. 36-71.
18
        <pb n="300" />
        THEORY OF STATISTICS,
EXERCISES.

1. (Ref. 4: total of columns of all the 13 tables given.)

Compare the actual with the theoretical mean and standard-deviation for
the following record of 6500 throws of 12 dice, 4, 5, or 6 being reckoned
as a ‘‘ success.”

Successes. Frequency. Successes. Frequency.
0 1 7 1351
14 3 844
103 9 391
302 10 117
71 11 21
1231 12 3
: 1411 - =
Total 6500

2. (Ref. 1.)

Balls were drawn from a bag containing equal numbers of black and white
balls, each ball being returned before drawing another. The records were then
grouped by counting the number of black balls in consecutive 2’s, 3’s, 4’s, 5’s,
etc. The following give the distributions so derived for grouping by 5%, 6,
and 7’s. Compare actual with theoretical means and standard-deviations.

(a) Groupin (6) Groupin ¢) Grouping

Successes by F a 2 by or 2 oe es
= 30 17 9
125 65 34
277 i 166 104
224 | 192 | 151
136 166 148
27 69 95
— 40
— — 4
Total E )

3. (Ref. 2, p. 22.)

Ten thousand drawings of a ball from a bag containing equal numbers of
black and white were made in the same manner as in the preceding example,
and then grouped into 100 sets of 100. The following gives the resulting
frequency of different numbers of white balls. Compare mean and standard-
deviation with theory.

Number, Frequency. ~ Number. Frequency: Number. Frequency.

34 1 44 54 R
35 - 45 : 55

36 —- 46 56 )
o - a 4 7 7 ,
3 .
; : be 9 =
4 i) , JO ?
4. Li 5 ol y
42 52 1 62 ’
43 53 63

274
81y 683 5QF
        <pb n="301" />
        XIIL.—SIMPLE SAMPLING OF ATTRIBUTES, 275

4. The proportion of successes in the data of Qu. 1is 0°5097. Find the stand-
ard-deviation of the proportion with the given number of throws, and state
whether you would regard the excess of successes as probably significant of bias
in the dice.

5. In the 4096 drawings on which Qu. 2 is based 2030 balls were black
and 2066 white. Is this divergence probably significant of bias?

6. If a frequency-distribution such as those of Questions 1, 2, and 3 be given,
show how =» and p, if unknown, may be approximately determined from the
mean and standard-deviation of the distribution.

Find 2 and p in this way from the data of Qu. 1 and Qu. 3.

7. Verify the following results for Table VI. of Chapter IX. p. 163, and
compare the results of the different grouping of the table on p. 263. In
calculating the actual standard-deviation, use Sheppard’s correction for
grouping (p. 212).

Actual Standard-

Row or Rows. Mean, Standard- deviation *

deviation s. of Sampling s,.

5082 1160 11°18
5095 6:79 6°45
510°0 528 5-00
4 5111 5:03 422
G 5102 367 373
6, 7 5097 4°13 3°24
8,9,10,'11 5087 3°10 2°69
12, 13, 14 5084 255 2:25
15 and upwards. 508-2 2°13 1-85

8. In a case of mice-breeding (see reference given in § 11) the harmonic
mean number in a litter was 4735, and the expected proportion of albinos
50 per cent. Find the standard-deviation of simple sampling for the pro-
portion of albinos in a litter, and state whether the actual standard-deviation
(21°63 per cent. ) probably indicates any real variation, or not.

9. (Data from Report i., Evolution Committee of the Royal Society, p. 17.)
In breeding certain stocks 408 hairy and 126 glabrous plants were obtained.
If the expectation is one-fourth glabrous, is the divergence significant, or might
it have occurred as a fluctuation of sampling ?

10. (Data of Example ix. and Qu. 5, Chap. IIL.) Is the association in
either of the following cases likely to have arisen as a fluctuation of simple
sampling ?

(a) (4B)=47 (48)=12 (eB)=21 (aB)=3
(®) (4.B)=309 (48)=214 (aB)=132 (aB)=119

11. The sex-ratio at birth is sometimes given by the ratio of male to female
births, instead of the proportion of male to total births. If Z is the ratio, 7.e.
Z=plq, show that the standard error of Z is approximately (1 +a2n/ Z

n
n being large, so that deviations are small compared with the mean. (The
student may find it useful to refer to § 8, Chap. XI.)

* Based on the mid-value of the class-interval for single rows, or the

harmonic mean of the mid-values for groups of rows.
        <pb n="302" />
        CHAPTER XIV
SIMPLE SAMPLING CONTINUED: EFFECT OF
REMOVING THE LIMITATIONS OF SIMPLE SAMPLING.
1. Warning as to the assumption that three times the standard error gives the
range for the majority of fluctuations of simple sampling of either sign
—2. Warning as to the use of the observed for the true value of p in
the formula for the standard error—3. The inverse standard error, or
standard error of the true proportion for a given observed proportion :
equivalence of the direct and inverse standard errors when = is large—
4-8. The importance of errors other than fluctuations of ‘‘simple
sampling ” in practice: unrepresentative or biassed samples—9-10.
Effect of divergences from the conditions of simple sampling: (a)
effect of variation in p and ¢ for the several universes from which the
samples are drawn—11-12. (5) Effect of variation in p and ¢ from one
sub-class to another within each universe—13-14. (¢) Effect of a
correlation between the results of the several events—15. Summary.
1. THERE are two warnings as regards the methods adopted in
the examples in the concluding section of the last chapter
which the student should note, as they may become of importance
when the number of observations is small. In the first place, he
should remember that, while we have taken three times the
standard error as giving the limits within which the great
majority of errors of sampling of either sign are contained,
the limits are not, as a rule, strictly the same for positive and
for negative errors. As is evident from the examples of actual
distributions in § 7, Chap. XIII, the distribution of errors is not
strictly symmetrical unless p=¢=05. No theoretical rule as
to the limits can be given, but it appears from the examples
referred to and from the calculated distributions in Chap. XV.
§ 3, that a range of three times the standard error includes
the great majority of the deviations in the direction of the
longer “tail” of the distribution, while the same range on the
shorter side may extend beyond the limits of the distribution
altogether. If, therefore, p be less than 0°5, our assumed range
may be greater than is possible for negative errors, or if p be
a

)Y
x
        <pb n="303" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 277
greater than 0-5, greater than is possible for positive errors. The
assumption is not, however, likely as a rule to lead to a serious
mistake ; as stated at the commencement of this paragraph, the
point is of importance only when = is small, for when # is large the
distribution tends to become sensibly symmetrical even for values
of p differing considerably from 0-5. (CF. Chap. XV. for the
properties of the limiting form of distribution.)

2. In the second place, the student should note that, where we
were unable to assign any a priori value to p, we have assumed
that it is sufficiently accurate to replace p in the formula for the
standard error by the proportion actually observed, say
Where 7 is large so that the standard error of 2» becomes small
relatively to the product pg the assumption is justifiable, and no
serious error is possible. If, however, n be small, the use of the
observed value = may lead to an under- or over-estimation of the
standard error which cannot be neglected. To get some rough
idea of the possible importance of such effects, the approximate
standard error ¢ may first be calculated as usual from the
observed proportion 7, and then fresh values recalculated, replac-
ing 7 by 7+3e. It should be remembered that the maximum
value of the product pg is given by »=¢=05, and hence these
values, if within the limits of fluctuations of sampling, will give
one limiting value for the standard error. The procedure is by
no means exact, but may serve to give a useful warning.

Thus in Example iii. of Chap. XIII. the observed proportion of
tall plants is 29/68, or, say, 43 per cent. The standard error of
this proportion is 6 per cent., and a true proportion of 50 per
cent. is therefore well within the limits of fluctuations of sampling.
The maximum value of the standard error is therefore

i
(20x50) = 606 per cent.
On the other hand, the standard error is unlikely to be lower
than that based on a proportion of 43 — 18 =25 per cent.,
i
(BX) =5'25 per cent.

3. The two difficulties mentioned in § 1 and 2 arise when n,
the number of cases in the sample, is small. The interpretation
of the value of the standard error is also more limited in this
case than when = is large. Suppose a large number of observa-
tions to be made, by means of samples of # observations each, on
different masses of material, or in different universes, for each of
which the true value of p is known. On these data we could
        <pb n="304" />
        273 THEORY OF STATISTICS.

form a correlation-table between the true proportion p in a given
universe and the observed proportion = in a sample of n observa-
tions drawn therefrom. What we have found from the work of
the last chapter is that the standard-deviation of an array of =’s
associated with a certain true value p, in this table, is (pg/n)t;
but the question may be asked —What is the standard-deviation
of the array at right angles to this, &lt;.e. the array of p’s associated
with a certain observed proportion =? In other words, given an
observed proportion w, what is the standard-deviation of the true
proportions? This is the inverse of the problem with which we
have been dealing, and it is a much more difficult problem.
On general principles, however, we can see that if n be large,
the two standard-deviations will tend, on the average of all
values of p, to be nearly the same, while if » be small the standard-
deviation of the array of =’s will tend to be appreciably the
greater of the two. For if #=p +38, 8 is uncorrelated with p,
and therefore if o, be the standard-deviation of p in all the
universes from which samples are drawn, o, the standard-
deviation of observed proportions in the samples, and os the
standard-deviation of the differences,

ol =o} +0}.

But o} varies inversely as ». Hence if » become very large, os
becomes very small, o, becomes sensibly equal to a, and therefore
the standard-deviations of the arrays, on an average, are also
sensibly equal. If n be large, therefore, [m(l—=)/n]} may be
taken as giving, with sufficient exactness, the standard-deviation
of the true proportion p for a given observed proportion 7. But
if » be small, os cannot be neglected in comparison with a, oy, i8
therefore appreciably greater than a, and the standard-deviation
of the array of #’s is, on an average of all arrays, correspondingly
greater than the standard deviation of the array of p’s—the state-
ment is not true for every pair of corresponding arrays, especially
for extreme values of p near 0 and 1. Further, it should be
noticed that, while the regression of = on p is unity—a.e. the
mean of the array of ’s is identical with p, the type of the
array—the regression of p on = is less than unity. If we as-
sume, therefore, that a tabulation of all possible chances, observed
for every conceivable subject, would give a distribution of p
ranging uniformly between 0 and 1, or indeed grouped symmetri-
cally in any way round 05, any observed value 7 greater than
0-5 will probably correspond to a true value of p slightly lower
than , and conversely. We have already referred to the use of
the inverse standard error in § 13 of Chap. XIII. (Case IL, p. 269).
If we determine, for example, the standard error of the difference

Tw
        <pb n="305" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 279
between two observed proportions by equation (6) of that chapter,
this may be taken, provided » be large, as approximately the
standard-deviation of true differences for the given observed
difference.

4. The use of standard errors must be exercised with care. It
is very necessary to remember the limited assumptions on which
the theory of simple sampling is based, and to bear in mind that
it covers those fluctuations alone which exist when all the assumed
conditions are fulfilled. The formule obtained for the standard
errors of proportions and of their differences have no bearing
except on the one question, whether an observed divergence of a
certain proportion from a certain other proportion that might be
observed in a more extended series of observations, or that has
actually been observed in some other series, might or might not
be due to fluctuations of simple sampling alone. Their use is
thus quite restricted, for in many cases of practical sampling this
is not the principal question at issue. The principal question in
many such cases concerns quite a different point, viz. whether the
observed proportion = in th: sample may not diverge from the
proportion p existing in the universe from which it was drawn,
owing to the nature of the conditions under which the sample was
taken, = tending to be definitely greater or definitely less than
p. Such divergence between 7 and p might arise in two distinct
ways, (1) owing to variations of classification in sorting the
4’s and os, the characters not being well defined—a source of
error which we need not further discuss, but one which may lead
to serious results [cf. ref. 5 of Chap. V.]. (2) Owing to either 4’s
or as tending to escape the attentions of the sampler. To give
an illustration from artificial chance, if on drawing samples from
a bag containing a very large number of black and white balls
the observed proportion of black balls was =, we could not
necessarily infer that the proportion of black balls in the bag was
approximately =, even though the standard error were small, and
we knew that the proportions in successive samples were subject
to the law of simple sampling. For the black balls might be,
say, much more highly polished than the white ones, so as to
tend to escape the fingers of the sampler, or they might be re-
presented by a number of lively black insects sheltering amongst
white stones: in neither case would the ratio of black balls to
white, or of insects to stones, be represented in their proper pro-
portions. Clearly, in any parallel case, inferences as to the

material from which the sample is drawn are of a very doubtful
and uncertain kind, and it is this uncertainty whether the chance
of inclusion in the sample is the same for 4’s and o’s, far more
than the mere divergences between different samples drawn in
        <pb n="306" />
        THEORY OF STATISTICS.

the same way, which renders many statistical results based on
samples so dubious.

5. Thus in collecting returns as to family income and expendi-
ure from working-class households, the families with lower
a are almost certain to be under-represented ; they largel
‘“ escape the sampler’s fingers” from their simple lack of abilit
o keep the necessary accounts. It is almost impossible to say,
however, to what extent they are under-represented, or to for
any estimate as to the possible error when two such samples
aken by different persons at different times, or in different places,
are compared. Again, if estimates as to crop-production are
formed on the basis of a limited number of voluntary returns,
the estimates are likely to err in excess, as the persons who
make the returns will probably include an undue proportio
of the more intelligent farmers whose crops will tend to be
above average. Whilst voluntary returns are in this way liable
to lead to more or less unrepresentative samples, compulsor
sampling does not evade the difficulty. Compulsion could not en-
sure equally accurate and trustworthy returns from illiterate
and well-educated workmen, from intelligent and unintelligent
armers. The following of some definite rule in drawing the
sample may also produce unrepresentative samples: if samples
of fruit were taken solely from the top layers of baskets expose
or sale, the results might be unduly favourable; if from th
ottom layer, unduly unfavourable.

6. In such cases we can see that any sample, taken in the
way supposed, is likely to be definitely biassed, in the sens
hat it will not tend to include, even in the long run, equa
roportions of the 4’s and o’s in the original material. In othe
ases there may be no obvious reason for presuming such bas,
ut, on the other hand, no certainty that it does not exist. Thus
if we noted the hair-colours of the children in, say, one
ol in ten in a large town, the question would arise whethe
his method would tend to give an unbiassed sample of all th
hildren. No assured answer could be given: conjectures o
he matter would be based in part on the way in which the
chools were selected, e.g. the volunteering of teachers for the work
might in itself introduce an element of bias. Again, if say
0,000 herrings were measured as landed at various North Se
ports, and the question were raised whether the sample was
likely to be an unbiassed sample of North Sea herrings, no
assured answer could be given. There may be no definite reaso
for expecting definite bias in either case, but it may exist, an
no mere examination of the sample itself can give any informa
WE to whether it exists or no.

280)
        <pb n="307" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 281

7. Such an examination may be of service, however, as

indicating one possible source of bias, viz. great heterogeneity in
the original material. If, for example, in the first illustration,
the hair-colours of the children differed largely in the different
schools—much more largely than would be accounted for by
fluctuations of simple sampling—it would be obvious that ome
school would tend to give an unrepresentative sample, and
questionable therefore whether the five, ten or fifteen schools
observed might not also have given an unrepresentative sample.
Similarly, if the herrings in different catches varied largely, it
would, again, be difficult to get a representative sample for a
large area. But while the dissimilarity of subsamples would
then be evidence as to the difficulty of obtaining a representative
sample, the similarity of subsamples would, of course, be no
evidence that the sample was representative, for some very
different material which should have been represented might
have been missed or overlooked.

8. The student must therefore be very careful to remember
that even if some observed difference exceed the limits of fluctua
tion in simple sampling, it does not follow that it exceeds the
limits of fluctuation due to what the practical man would regard —
and quite rightly regard—as the chances of sampling. Further,
he must remember that if the standard error be small, it by no
means follows that the result is necessarily trustworthy : the
smallness of the standard error only indicates that it is not
untrustworthy owing to the magnitude of Jluctuations of simple
sampling. It may be quite untrustworthy for other reasons:
owing to bias in taking the sample, for instance, or owing to definite
errors in classifying the 4’s and a’s. On the other hand, of course,
it should also be borne in mind that an observed proportion is not
necessarily incorrect, but merely to a greater or less extent
untrustworthy if the standard error be large. Similarly, if an
observed proportion 7, in a sample drawn from one universe be
greater than an observed proportion 7, in a sample drawn from
another universe, but =, — =, is considerably less than three times
the standard error of the difference, it does not, of course, follow
that the true proportion for the given universes, p, and p,, are
most probably equal. On the contrary, p, most likely exceeds p, ;
the standard error only warns us that this conclusion is more or
less uncertain, and that possibly p, may even exceed Pr

9. Let us now consider the effect, on the standard-deviation of
sampling, of divergences from the conditions of simple sampling
which were laid down in § 8 of Chap. XIII.

First suppose the condition (a) to break down, so that there is
some essential difference between the localities from which, or the
        <pb n="308" />
        Ea THEORY OF STATISTICS.
conditious under which, samples are drawn, or that some essential
change has taken place during the period of sampling. We may
represent such circumstances in a case of artificial chance by
supposing that for the first f; throws of = dice the chance of
success for each die is p;, for the next f, throws p,, for the next f,
throws pg, and so on, the chance of success varying from time to
time, just as the chance of death, even for individuals of the same
age and sex, varies from district to district. Suppose, now, that
the records of all these throws are pooled together. The mean
number of successes per throw of the n dice is given by
n
MU = (apy +1oPy + 13Ps + ltl ge ) = 1.0,
where V=23(f) is the whole number of throws and p, is the mean
value 2(fp)/N of the varying chance p. To find the standard-
deviation of the number of successes at each throw consider that
the first set of throws contributes to the sum of the squares of
deviations an amount
Alnengy + ney =o’)
n.p,q, being the square of the standard-deviation for these throws,
and n(p, -p,) the difference between the mean number of
successes for the first set and the mean for all the sets together.
Hence the standard-deviation o of the whole distribution is given
by the sum of all quantities like the above, or
No? =n2(fpqg) + n* 2f(p — Py)
Let o, be the standard-deviation of p, then the last sum is
N.n2, and substituting 1 — p for ¢, we have
ol = np, — np; — no&gt; + Noy
= npg + n(n —1)as . . + AD)
This is the formula corresponding to equation (1) of Chap.
XIII : if we deal with the standard-deviation of the proportion
of successes, instead of that of the absolute number, we have,
dividing through by 7? the formula corresponding to equation
(2) of Chap. XIIL., viz.—
e Loto ol
Bn + Sa : (2)
10. If » be large and s, be the standard-deviation calculated
from the mean proportion of successes p), equation (2) is sensibly
of the form
S2= 5 = oo

789
        <pb n="309" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 283
TABLE showing Frequencies of Registration Districts in England and Wales
with Different Proportions of Deaths in Childbirth (including Deaths
Jrom Puerperal Fever) per 1000 Births in the same Year, for the same
Groups of Districts as in the Table of Chap. XIII. § 10. Data from same
source. Decade 1881-90.
Number of Births in the Decade.
Deaths in
Childbirth per 1509 | 3500 | 4500 | 10,000 15,000 | 30,000 | 50,000
1000 Births. to | to | to 0 to to . to
2500. 1 4000. | 5000. 15,000. 20,000. ' 50,000, 90,000.
1'5- 20 t &gt; = — =
20- 2°5 ~ Be a
2+'5- 30 os
3'0- 35 : - -
3'5- 4°0
4'0- 45 2.
4:5- 50 7
5'0- 5'5
55-60
$:0- 65
6'56—- 70
70-75 =
7'5- 8°0 :
8:0- 85 2
8+5—- 90
9°0- 95
9+5-10°0
100-105
105-110
Total : Ld i SE 73 33 35
Shona Fron I 10:5 438 499 |  . = 4-64
tandard-de-, , . , . ¥ ’ : z .
Y tiation } 199, 101 0:99 | 112 087
Theoretical)
standard -de-
viationcorre- | 1-52 | 112 | 097 | 061 053 036 0-26
sponding to
mean births
A/s2—52 071 | 0-80 | 051 | 030 , ¢°4 | 107 | 083
|
and hence, knowing s and s,, we can find o, the standard-deviation
of the chance or proportion in the universes from which the
samples have been drawn.
The values of \/s*—s; are tabulated at the foot of the table
showing the distribution of the proportion of male births in
        <pb n="310" />
        284 THEORY OF STATISTICS.
certain registration districts of England, in § 10 of Chap. XIII.
p- 263. It will be seen that in the first group of small districts
there appears to be a significant standard-deviation of some 6
units in the proportion of male births per thousand, but in the
more urban districts this falls to 1 or 2 units; in one case only
does s fall short of s,. In the table on p. 283 are given some
different data relating to the deaths of women in childbirth in the
same groups of districts, and in this case the effect of definite
causes is relatively larger, as one might expect. The values of
Js? — st suggest an almost uniform significant standard-deviation
o,=0'8 in the deaths of women per thousand births, five out of
the eight values being very close to this average. The figures of
this case also bring out clearly one important consequence of (2),
viz. that if we make » large s becomes sensibly equal to o,, while
if we make » small s becomes more nearly equal to p,g,/n. Hence
if we want to know the significant standard-deviation of the pro-
portion p—the measure of its fluctuation owing to definite causes
—n should be made as large as possible ; if, on the other hand, we
want to obtain good illustrations of the theory of simple sampling
n should be made small. If » be very large the actual standard-
deviation may evidently become almost indefinitely large com-
pared with the standard-deviation of sampling. Thus during the
20 years 1855-74 the death-rate in England and Wales fluctuated
round a mean value of 222 per thousand with a standard-devia-
tion of 0:86. Taking the mean population as roughly 21 millions,
the standard-deviation of sampling is approximately
22 x 978
vo 3 106 =0052

This is only about one twenty-seventh of the actual value.

11. Now consider the effect of altering the second condition
of simple sampling, given in § 8 (8) of Chapter XIII., viz. the
condition that the chances p and ¢ shall be the same for every
die or coin in the set, or the circumstances that regulate the
appearance of the character observed the same for every individual
or every sub-class in each of the universes from which samples
are drawn. Suppose that in the group of n dice thrown the
chances for m, dice are p; ¢,; for m, dice, p, ¢,, and so on,
the chances varying for different dice, but being constant
throughout the experiment. The case differs from the last, as
in that the chances were the same for every die, at any one
throw, but varied from one throw to another: now they are con-
stant from throw to throw, but differ from one die to another as
they would in any ordinary set of badly made dice. Required to
find the effect of these differing chances.
        <pb n="311" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 285
For the mean number of successes we evidently have
M=mp, +m,py + maps +... .
=n a,
Pp, being the mean chance Z(mp)/n. To find the standard-deviation
of the number of successes at each throw, it should be noted that
this may be regarded as made up of the number of successes in
the m, dice for which the chances are 21 9; together with the
number of successes amongst the m, dice for which the chances
are p, q,, and so on: and these numbers of successes are all
independent. Hence
0 =m P1qy + MoPoy + MgPigs + +o
= Z(mpq),
Substituting 1-p for ¢, as before, and using o, to denote the
standard-deviation of p,

ol =npyg, — no; v2.3)
or if s be, as before, the standard-deviation of the proportion of
successes,

_Puls_0y

$= Si . 4)

12. The effect of the chances varying for the individual dice or
other “events” is therefore to lower the standard-deviation, as
calculated from the mean proportion Ppp and the effect may
conceivably be considerable. To take a limiting case, if p be zero
for half the events and unity for the remainder, Po=9,=3, and
o,=3, so that s is zero. To take another illustration, still some-
what extreme, if the values of p are uniformly distributed over
the whole range between 0 and 1, p,=9,=4 as before but =
1/12=0-0833 (Chap. VIIL § 12, p. 143). “Hence §2=01667/n,
s=0408//n, instead of 0-5/n/n, the value of s if the chances are
$ in every case. In most practical cases, however, the effect will be
much less. Thus the standard-deviation of sampling for a death-
rate of, say, 18 ver thousand in a population of uniform age and
one sex is (18 x 982)}/s/n=133//n. Ina population of the age
composition of that of England and Wales, however, the death-
rate is not, of course, uniform, but varies from a high value in
infancy (say 150 per thousand), through very low values (2 to 4
per thousand) in childhood to continuously increasing values in
old age ; the standard-deviation of the rate within such a popula-
tion is roughly about 30 per thousand. But the effect of this
        <pb n="312" />
        : THEORY OF STATISTICS.
variation on the standard-deviation of simple sampling is quite
small, for, as calculated from equation (4),
#=—(18 x 982 — 900)
s=130/n/n
as compared with 133/s/n.

13. We have finally to pass to the third condition (c) of § 8, Chap.
XIIL, and to discuss the effect of a certain amount of dependence
between the several “events” in each sample. We shall suppose,
however, that the two other conditions (a) and (0) are fulfilled,
the chances p and ¢ being the same for every event at every trial,
and constant throughout the experiment. The problem is again
most simply treated on the lines of § 5 of the last chapter. The
standard-deviation for each event is (pg)! as before, but the events
are no longer independent: instead, therefore, of the simple
expression

0? =n.pg,
we must have (cf. Chap. XL. § 2)
o2=npq+2pq(rg +r t «oo Togt ooo)
where, 7,4, 7,4, etc. are the correlations between the results of the
first and second, first and third events, and so on—correlations
for variables (number of successes) which can only take the
values 0 and 1, but may nevertheless, of course, be treated as
ordinary variables (¢f. Chap. XI. § 10). There are n(n —1)/2
correlation-coefficients, and if, therefore, 7 is the arithmetic mean
of the correlations we may write
a? =mnpg[l +7(n—-1)]. ; . (Bb)
The standard-deviation of simple sampling will therefore be
increased or diminished according as the average correlation
between the results of the single events is positive or negative,
and the effect may be considerable, as o may be reduced to zero
or increased to m(pg)t. For the standard deviation of the propor-
tion of successes in each sample we have the equation
s2 = +r(n-1)] . (8)

It should be noted that, as the means and standard-deviations
for our variables are all identical, » is the correlation-coefficient
for a table formed by taking all possible pairs of results in the
n events of each sample.

286
        <pb n="313" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 287
It should also be noted that the case when r is positive covers
the departure from the rules of simple sampling discussed in
$ 9-10: for if we draw successive samples from different records,
this introduces the positive correlation at once, even although the
results of the events at each trial are quite independent of one
another. Similarly, the case discussed in §§ 11-12 is covered by
the case when 7» is negative : for if the chances are not the same
for every event at each trial, and the chance of success for some
one event is above the average, the mean chance of success for the
remainder must be below it. The cases (a), (6) and (c) are, how-
ever, best kept distinct, since a positive or negative correlation
may arise for reasons quite different from those discussed in

§ 9-12.
3 14. As a simple illustration, consider the important case of
sampling from a limited universe, e.g. of drawing n balls in
succession from the whole number in a bag containing pw white
balls and gw black balls. On repeating such drawings a large
number of times, we are evidently equally likely to get a white
ball or a black ball for the first, second, or nth ball of the sample :
the correlation-table formed from all possible pairs of every sample
will therefore tend in the long run to give just the same form of
distribution as the correlation-table formed from all possible pairs
of the w balls in the bag. But from Chap. XI. § 11 we
know that the correlation-coefficient for this table is — 1 [(w—1),

whence

n-1
0? =n.pq(1 oo 1)
w—n
= 9% 5

If n=1, we have the obviously correct result that o = (p9)}, as
in drawing from unlimited material : if, on the other hand, n=w,
o becomes zero as it should, and the formula is thus checked for
simple cases. For drawing 2 balls out of 4, ¢ becomes 0-816
(npg); for drawing 5 balls out of 10, 0-745 (npg)t; in the case
of drawing half the balls out of a very large number, it approxi-
mates to (0-5.npq)}, or 0-707 (npg)t.

In the case of contagious or infectious diseases, or of certain
forms of accident that are apt, if fatal at all, to result in whole-
sale deaths, r is positive, and if n be large (as it usually is in such
cases) a very small value of » may easily lead to a very great increase
in the observed standard-deviation. It is difficult to give a really
good example from actual statistics, as the conditions are hardly
ever constant from one year to another, but the following will
        <pb n="314" />
        THEORY OF STATISTICS.
serve to illustrate the point. During the twenty years 1887-1906
there were 2107 deaths from explosions of firedamp or coal-dust
in the coal-mines of the United Kingdom, or an average of 105
deaths per annum. From § 12 of Chap. XIII. it follows that this
should be the square of the standard-deviation of simple sampling,
or the standard-deviation itself approximately 10-3. But the
square of the actual standard-deviation is 7178, or its value 84-7,
the numbers of deaths ranging between 14 (in 1903) and 317
(in 1894). This large standard-deviation, to judge from the
figures, is partly, though not wholly, due to a general tendency to
decrease in the numbers of deaths from explosions in spite of a
large increase in the number of persons employed ; but even if we
ignore this, the magnitude of the standard-deviation can be
accounted for by a very small value of the correlation r, expressive
of the fact that if an explosion is sufficiently serious to be fatal to
one individual, it will probably be fatal to others also. For if a,
denote the standard-deviation of simple sampling, ¢ the standard-
deviation of sampling given by equation (5), we have
Siero
"= Dot
Whence, from the above data, taking the numbers of persons
employed underground at a rough average of 560,000,
7073
= S00 = 105 ~ 0.00012,

15. Summarising the preceding paragraphs, §§ 9-14, we see
that if the chances p and ¢ differ for the various universes,
districts, years, materials, or whatever they may be from which
the samples are drawn, the standard-deviation observed will be
greater than the standard-deviation of simple sampling, as
calculated from the average values of the chances : if the average
chances are the same for each universe from which a sample is
drawn, but vary from individual to individual or from one sub-
class to another within the universe, the standard-deviation
observed will be less than the standard-deviation of simple
sampling as calculated from the mean values of the chances:
finally, if p and ¢ are constant, but the events are no longer
independent, the observed standard-deviation will be greater or
less than the simplest theoretical value according as the corre-
lation between the results of the single events is positive or
negative. These conclusions further emphasise the need for
caution in the use of standard errors. If we find that the

288
        <pb n="315" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 289
standard-deviation in some case of sampling exceeds the standard-
deviation of simple sampling, two interpretations are possible :
either that p and ¢ are different in the various universes from
which samples have been drawn (i.e. that the variations are
more or less definitely significant in the sense of § 13, Chap. XIII),
or that the results of the events are positively correlated inter
se. If the actual standard-deviation fall short of the standard-
deviation of simple sampling two interpretations are again
possible, esther that the chances p and ¢ vary for different
individuals or sub-classes in each universe, while approximately
constant from one universe to another, or that the results of
the events are negatively correlated inter se. Even if the
actual standard-deviation approaches closely to the standard-
deviation of simple sampling, it is only a conjectural and not
a necessary inference that all the conditions of * simple sampling ”
as defined in § 8 of the last chapter are fulfilled. Possibly, for
example, there may be a positive correlation » between the
results of the different events, masked by a variation of the
chances p and ¢ in sub-classes of each universe.

Sampling which fulfils the conditions laid down in § 8 of
Chap. XIII., simple sampling as we have called it, is generally
spoken of as random sampling. We have thought it better to
avoid this term, as the condition that the sampling shall be
random—haphazard—is not the only condition tacitly assumed.

REFERENCES.

go generally the references to Chap. XIIL, to which may be

aadaea—

(1) PEARSON, KARL, ‘“ On certain Properties of the Hypergeometrical Series,
and on the fitting of such Series to Observation Polygons in the Theory of
Chance,” Philosophical Magazine, 5th Series, vol. xlvii., 1899, p. 236.
(An expansion of one section of ref. 10 of Chap. XIII., dealing with the
first problem of our § 14, i.e. drawing samples from a bag containing
a limited number of white and black balls, from the standpoint of the
frequency-distribution of the number of white or black balls in the
samples, )

(2) GREENWOOD, M., “On Errors of Random Sampling in certain Cases not
suitable for the Application of a ‘ Normal Curve of Frequency,’ Bio-
metrika, vol. ix., 19183, pp. 69-90. (If an event has succeeded p times in
n trials,what are the chances of 0, 1, . . . m successes in m subsequent
trials! Tables for small samples.)

EXERCISES.

1. Referring to Question 7 of Chap. XIII, work out the values of the
significant standard-deviation o, (as in § 10) for each row or group of rows
there given, but taking row 5 with rows 6 and 7,
19
        <pb n="316" />
        - THEORY OF STATISTICS.

2. For all the districts in England and Wales included in the same table
(Table VI., Chap. IX.) the standard-deviation of the proportion of male births
per 1000 of all births is 7:46 and the mean proportion of male births 509-2.
The harmonic mean number of births in a district is 5070. Find the signi-
ficant standard-deviation oy.

3. If for one half of » events the chance of success is p and the chance of
failure g, whilst for the other half the chance of success is ¢ and the chance of
failure p, what is the standard-deviation of the number of successes, the events
being all independent ?

4. The following are the deaths from small-pox during the 20 years
1882-1901 in England and Wales : —

1882 1317 1892 431
83 957 93 1457
84 2234 94 820
85 2827 95 223
86 275 96 541
87 506 97 25
88 1026 98 253
89 23 99 174
90 16 1900 85
91 49 1901 356

The death-rate from small-pox being very small, the rule of § 12, Chap.
XIII., may be applied to estimate the standard-deviation of simple sampling.
Assuming that the excess of the actual standard-deviation over this can be
entirely accounted for by a correlation between the results of exposure to risk
of the individuals composing the population, estimates. The mean population
during the period may be taken in round numbers as 29 millions.

200)
        <pb n="317" />
        CHAPTER XV.
THE BINOMIAL DISTRIBUTION AND THE
NORMAL CURVE.

1-2. Determination of the frequency-distribution for the number of successes
in n events: the binomial distribution—3. Dependence of the form
of the distribution on p, ¢ and n—4-5. Graphical and mechanical
methods of forming representations of the binomial distribution—
6. Direct calculation of the mean and the standard-deviation from
the distribution—7-8. Necessity of deducing, for use in many
practical cases, a continuous curve giving approximately, for large
values of n, the terms of the binomial series—9. Deduction of the
normal curve as a limit to the symmetrical binomial—10-11. The
value of the central ordinate—12. Comparison with a binomial dis-
tribution for a moderate value of n—13. Outline of the more general
conditions from which the curve can be deduced by advanced methods—
14. Fitting the curve to an actual series of observations—15, Difficulty
ofa complete test of fit by elementary methods— 16. The table of areas
of the normal curve and its use—17. The quartile deviation and the
‘“ probable error ”—18. Illustrations of the application of the normal
curve and of the table of areas.

1. In Chapters XIII. and XIV. the standard-deviation of the

number of successes in n events was determined for the several

more important cases, and the applications of the results indicated.

For the simpler cases of artificial chance it is possible, however, to

go much further, and determine not merely the standard-deviation

but the entire frequency-distribution of the number of successes.”

This we propose to do for the case of “simple sampling,” in which

all the events are completely independent, and the chances » and

q the same for each event and constant throughout the trials.

The case corresponds to the tossing of ideally perfect coins (homo-

geneous circular discs), or the throwing of ideally perfect dice

(homogeneous cubes).

2. If we deal with one event only, we expect in IV trials, Ng
failures and Np successes. Suppose we how combine with the
results of this first event the results of a second. The two events
are quite independent, and therefore, according to the rule of
291
        <pb n="318" />
        a 1 2 3
n AL 1 -
AT oe AT oi is .

DO
©.
Number of Successes. J No
- : ES — == —t =
One event N.o N.p
N.q? Npg + Npg Ni?
Two events N.o? 2N.pgq N.p? C
J : : ts
v ! Ei
Ng? N.pg® + 2N.pg?.  2N.p%q¢ + DN.p%g N.v3
Three events N.¢? 3. pg? 3N.p%g N. 8
N.¢* N.pg® + 8N.pg® 3N.p%?% + 3N.p%? 38N.p%q + N.p%¢ aE
Four events oo dy. 6.V. v2%02 AN.
        <pb n="319" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 293
independence, of the Ng failures of the first event (Ng)g will be
associated (on an average) with failures of the second event, and
(&amp;g)p with successes of the second event (cf. row 2 of the scheme
on p. 292). Similarly of the Ap successful first events, (¥p)g will
be associated (on an average) with failures of the second event
and (Np)p with successes. In trials of two events we would
therefore expect approximately Ng? cases of no success, 2Npg
cases of one success and one failure, and Np? cases of two successes,
as in row 3 of the scheme. The results of a third event may be
combined with those of the first two in precisely the same way.
Of the Ng® cases in which both the first two events failed, (¥q2)q
will be associated (on an average) with failure of the third also,
(N¢®)p with success of the third. Of the 2/Npq cases of one
success and one failure, (28pg)g will be associated with failure
of the third event and (28pg)p with success, and similarly for
the Np? cases in which both the first two events succeeded. The
result is that in A trials of three events we should expect Ng?
cases of no success, 3 Npg? cases of one success, 3 Np? cases of two
successes, and Np? cases of three successes, as in row 5 of the
scheme. The scheme is continued for the results of a fourth
event, and it is evident that all the results are included under a
very simple rule: the frequencies of 0, 1, 2 . . . . successes are
given

for one event by the binomial expansion of N(g +p)

for two events » ” Ng +p)?

for three events i N(g+p)?

for four events . fs N(g+p)t
and soon. Quite generally, in fact :—the Jrequenciesof0,1,2 . . ..
successes in IN trials of n events are given by the successive terms
wn the binomial expansion of N(q +p)", viz.—

n(n —1 n(n—1)(n-2
vy "+ n.g" p+ Lon ) pry ( RS Jolson l
This is the first theoretical expression that we have obtained for
the form of a frequency-distribution,

3. The general form of the distributions given by such
binomial series will have been evident from the experimental
examples given in Chapter XIII, i.e. they are distributions
of greater or less asymmetry, tailing off in either direction
from the mode. The distribution is, however, of so much
importance that it is worth while considering the form in
greater detail. This form evidently depends (1) on the values
of ¢ and p, (2) on the value of the exponent n. If p and ¢
are equal, evidently the distribution must be symmetrical, for
        <pb n="320" />
        294 THEORY OF STATISTICS,
p and ¢ may be interchanged without altering the value of
any “term, and consequently terms equidistant from either
end of the series are equal. If » and ¢ are unequal, on the
other hand, the distribution is asymmetrical, and the more
asymmetrical, for the same value of 7, the greater the inequality
of the chances. The following table shows the calculated
distributions for m=20 and values of p, proceeding by 0.1,
from 0.1 to 0.5. When p=0.1, cases of two successes are the
A. — Terms of the Binomial Series 10,000 (q+ p)? for Values of p
from 0-1 to 0°5. (Ligures given to the nearest unit.)
Number of p=0:1 p=0:2 »=0'3 p=0408 p=05
Successes. q=0°9 g=0-8 g=07 g=0'0lg=05
0 1216 115 8 — =
2702 576 68 5 —
92852 1369 278 s1 | 2
Bi 1901 2054 716 ol 11
§98 2182 1304 850 46
319 1746 1789 746 148
89 1091 1916 | 1244 370
20 545 1643 1659 739
4 222 1144 1797 1201
1 74 654 1597 1602
Y 0 308 1171 1762
120 710 1602
’ 355 1201
146 739
19 370
Py he Nl 148
a 3 46
17 - — 11
13 - — 2
19 = =
20 =
most frequent, but cases of one success almost equally frequent :
even nine successes may, however, occur about once in 10,000
trials. As p is increased, the position of the maximum
frequency gradually advances, and the two tails of the distribution
become more nearly equal, until p=0.5, when the distribution
is symmetrical. Of eourse, if the table were continued, the
distribution for p=0.6 would be similar to that for ¢=0.6
but reversed end for end, and so on. Since the standard-
deviation is (npg)! and the maximum value of pg is given by
p=g¢q, the symmetrical distribution has the greatest dispersion.

a.
        <pb n="321" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 295
If p=q the effect of increasing m is to raise the mean and
increase the dispersion. If p is not equal to ¢, however, not
only does an increase in mn raise the mean and increase the
dispersion, but it also lessens the asymmetry; the greater
n, for the same value of p and ¢, the less the asymmetry.
Thus if we compare the first distribution of the above table
with that given by »=100, we have the following :—
B.—Terms of the Binomial Series 10,000 (0'9 + 0-1), (Figures given
to the nearest unit.)
Number Number Number
01 Frequency. of Frequency. of Frequency.
Successes. Successes. Successes.
y — : 1148 15 193
3 1304 17 106
16 1 1319 83 54
59 1 1199 | 13 2.
IX E1508 [&amp; £12 988 | 2) I
339 | 13 743 21
596 1% 513 24
’29 £7 :
The maximum frequencies now occur for 9 and 10 successes,
and the two “tails” are much more nearly equal. If, on the
other hand, n is reduced to 2, the distribution is—
Number of Successes, Frequency.
8100
1800
100
and the maximum frequency is at one end of the range. What-
ever the values of p and ¢, if » is only increased sufficiently, the
distribution may be treated as sensibly symmetrical, the necessary
condition being (we state this without proof) that p —¢ shall be
small compared with the standard-deviation npg. It is left
to the student to calculate as an exercise the theoretical distribu-
tions corresponding to the experimental results cited in Chapter
XIII. (Question 1).
4. The property of the binomial series used in the scheme of
§ 2 for deducing the series with exponent » from that with
exponent n-1 leads to two interesting methods—graphical and
mechanical — for constructing approximate representations of
        <pb n="322" />
        20° THEORY OF STATISTICS.
binomial distributions. It will have been noted that any one
term—say the 7th—in one series is obtained by taking ¢ times the
rth term together with p times the (r—1)th term of the preceding
series. Now if AP, CR (figure 46) be two verticals, and a third,
BQ), be erected between them, cutting PR in , so that
AB :BC :q:p, then

BQ=p.AP + q.CR.
(This follows at once on joining AR and considering the two
segments into which BQ is divided.) Consider then some
binomial, say for the case p=1, g=2. Draw a series of verticals
(the heavy verticals of fig. 47) at any convenient distance apart

on Bpc

Fre. 46.

on a horizontal base line, and erect other verticals (the lighter
verticals) dividing the distance between them in the ratio of
q:p, viz. 3:1. Next, choosing a vertical scale, draw the binomial
polygon for the simplest case n=1; in the diagram XN has been
taken = 4096, and the polygon is abed, 0b = 3072, 1lc=1024, The
polygons for higher values of » may now be constructed graphi-
cally. Mark the points where ab, bc, cd respectively cut the
intermediate verticals and project them horizontally to the right
on to the thick verticals. This gives the polygon ad'c’d’e for
n=2. Forob =gq.0b, 1c'=p.0b+q.1c, and so on. Similarly, if the
points where a®’, b'c, etc.,, cut the intermediate verticals are
projected horizontally on to the thick verticals, we have the
polygon ab”¢"d"¢"f” for n=38. The process may be continued

96
        <pb n="323" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 297
indefinitely, though it will be found difficult to maintain any
high degree of accuracy after the first few constructions.

#]
3 Nan hh 2
: $ 3 aT
0 0 ~N a ; 2 -
5. The mechanical method of constructing the representation of
a binomial series is indicated diagrammatically by fig. 48. The
        <pb n="324" />
        &gt; THEORY OF STATISTICS,

apparatus consists of a funnel opening into a space—say a § inch in
depth— between a sheet of glass and a back-board. This space is
broken up by successive rows of wedges like 1, 2 3, 4 5 6, etc., which
will divide up into streams any granular material such as shot or
mustard seed which is poured through the funnel when the
apparatus is held at a slope. At the foot these wedges are
replaced by vertical strips, in the spaces between which the

, | {
Lr | 5 | (ec |
~ “~~
Fie. 48.—The Pearson-Galton Binomial Apparatus.

material can collect. Consider the stream of material that
comes from the funnel and meets the wedge 1. This wedge 1s
set so as to throw ¢ parts of the stream to the left and p parts
to the right (of the observer). The wedges 2 and 3 are set so as
to divide the resultant streams in the same proportions. Thus
wedge 2 throws ¢2 parts of the original material to the left and
gp to the right, wedge 3 throws pg parts of the original material
to the left and p? to the right. The streams passing these wedges
are therefore in the ratio of ¢2: 2gp: p2. The next row of wedges
is again set so as to divide these streams in the same proportions

298
        <pb n="325" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 299
as before, and the four streams that result will bear the propor-
tions ¢%: 39% : 3gp?: p®. The final set, at the heads of the
vertical strips, will give the streams proportions ¢*: 4¢3%p : 69%? :
49p®: p*, and these streams will accumulate between the strips
and give a r~presentation of the binomial by a kind of histogram,
as shown. Of course as many rows of wedges may be provided
as may be desired.

This kind of apparatus was originally devised by Sir Francis
Galton (ref. 1) in a form that gives roughly the symmetrical
binomial, a stream of shot being allowed to fall through rows of
nails, and the resultant streams being collected in partitioned
spaces. The apparatus was generalised by Professor Pearson,
who used rows of wedges. fixed to movable slides, so that they
could be adjusted to give any ratio of g:p. (Ref. 13.)

6. The values of the mean and standard-deviation of a binomial
distribution may be found from the terms of the series directly,
as well as by the method of Chap. XIII. (the calculation was
in fact given as an exercise in Question 8, Chap. VII., and
Question 6, Chap. VIIL). Arrange the terms under each other
as in col. 1 below, and treat the problem as if it were an arith-
metical example, taking the arbitrary origin at 0 successes: as
XV is a factor all through, it may be omitted for convenience.

(1) oh (4)

Frequency f. Dev. &amp;. JE

qn Te es —y
nglp nq" 1p nq" 1p

-1

i 3 — n(n —-1)g—2p? 2n(n —- 1)g"—2p?
mn-1)n-2) | n(n—1)(n-2) 3n(n—-1)(n-2

23 7 Ml fy peg BEE D

The sum of col. 1 is of course unity, i.e. we are treating IV as
unity, and the mean is therefore given by the sum of the terms
in col. (3). But this sum is

n—-1)(n-2
np | "+ (n- Nye Tai) 1 Js 4 Sot ;
=np(q +p)" =np.
That is, the mean J/ is mp, as by the method of Chap. XIII
        <pb n="326" />
        324 THEORY OF STATISTICS.

The square of the standard-deviation is given by the sum of
the terms in col. (4) less the square of the mean, that is,

rr=np { gr-1+20n- )gn=rp + 80g spn Bde |- np,

But the series in the bracket is the binomial series (q+ p)"!
with the successive terms multiplied by 1, 2, 3, . . . It therefore
gives the difference of the mean of the said binomial from -1,
and its sum is therefore (n — 1)p +1. Therefore

oZ=np{(n-1)p +1} — n%p?
= np — np? =npq.

7. The terms of the binomial series thus afford a means of
completely describing a certain class of frequency-distributions—
v.e. of giving not merely the mean and standard-deviation in
each case, but of describing the whole form of the distribution.
If &amp;V samples of n cards each be drawn from an indefinitely large
record of cards marked with 4 or a, the proportion of A-cards
in the record being p, then the successive terms of the series
N(q +p)" give the frequencies to be expected in the long run of
0, 1, 2, . . . 4-cards in the sample, the actual frequencies only
deviating from these by errors which are themselves fluctuations
of sampling. The three constants XN, p, n, therefore, determine
the average or smoothed form of the distribution to which actual
distributions will more or less closely approximate.

Considered, however, as a formula which may be generally
useful for describing frequency-distributions, the binomial series
suffers from a serious limitation, viz. that it only applies to a
strictly discontinuous distribution like that of the number of
A-cards drawn from a record containing 4’s and a’s, or the number
of heads thrown in tossing a coin. The question arises whether
we can pass from this discontinuous formula to an equation
suitable for representing a continuous distribution of frequency.

8. Such an equation becomes, indeed, almost a necessity for
certain cases with which we have already dealt. Consider, for
example, the frequency-distribution of the number of male births
in batches of 10,000 births, the mean number being, say, 5100.
The distribution will be given by the terms of the series
(0-49 40-51)1900 and the standard-deviation is, in round numbers,
50 births. The distribution will therefore extend to some 150
births or more on either side of the mean number, and in order
to obtain it we should have to calculate some 300 terms of a
binomial series with an exponent of 10,000! This would not
only be practically impossible without the use of certain methods
of approximation, but it would give the distribution in quite

200
        <pb n="327" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 301
unnecessary detail: as a matter of practice, we would not have
comp ‘ed a frequency-distribution by single male births, but
would certainly have grouped our observations, taking probably
10 births as the class-interval. We want, therefore, to replace the
binomial series by some continuous curve, having approximately
the same ordinates, the curve being such that the area between
any two ordinates 7, and y, will give the frequency of observations
between the corresponding values of the variable x, and z,.

9. It is possible to find such a continuous limit to the binomial
series for any values of p and ¢, but in the present work we will
confine ourselves to the simplest case in which p = q¢=05, and the
binomial is symmetrical. The terms of the series are

od n(n-1) n(n-1)(n-2)
NE) 1 1b 5 Ag + —t i +....%
The frequency of m successes is
£2
N(3) [m|n—m
and the frequency of m+ 1 successes is derived from this by
multiplying it by (n-m)/(m+1). The latter frequency is
therefore greater than the former so long as
n—m&gt;m+1
n-1
mM THT
Suppose, for simplicity, that = is even, say equal to 2%; then the
frequency of % successes is the greatest, and its value is
12 2
%=N@) kk (1)
The polygon tails off symmetrically on either side of this greatest
ordinate. Consider the frequency of + x successes ; the value is
=NG 2k. | 24 9
= eh 123)
and therefore
veo BDE-DE=2) .... (i=2+1)
Yo (k + 1)(% + 2)(k + 3) EN CE)
1 2 3 xz-1
Joi. 300). ufr-%0 .
= +3) 2) 3 ry EY !
(143 1+72\1+7) . (1+ (143)

or
od
        <pb n="328" />
        THEORY OF STATISTICS.

Now let us approximate by assuming, as suggested in § 8, that
k is very large, and indeed large compared with z, so that (x/k)®
may be neglected compared with (x/k). This assumption does
not involve any difficulty, for we need not consider values of x
much greater than three times the standard-deviation or 3 JE2,
and the ratio of this to % is 3/ ~ 2k, which is necessarily small if £
be large. On this assumption we may apply the logarithmic
series

S20 ST et
log,(1+68)=20 TEE ad
to every bracket in the fraction (3), and neglect all terms beyond
the first. To this degree of approximation,
z 2 re
logle= -(1+2+3+ .. +o- 0-7
a=)»
5 Zk
22
Ro
Therefore, finally,
22 2
Rr SEE Ry (4)
Yo=Yl = =e
where, in the last expression, the constant % has been replaced by
the standard-deviation o, for o2="£%/2.

The curve represented by this equation is symmetrical about
the point = 0, which gives the greatest ordinate y=y, Mean,
median, and mode therefore coincide, and the curve is, in fact, that
drawn in fig. 5, p. 89, and taken as the ideal form of the symmetri-
cal frequency-distribution in Chap. VI. The curve is generally
known as the normal curve of errors or of frequency, or the law
of error.

10. A normal curve is evidently defined completely by giving
the values of y, and o and assigning the origin of x. If we
desire to make a normal curve fit some given distribution as near
as may be, the last two data are given by the standard-deviation
and the mean respectively ; the value of g, will be given by the
fact that the areas of the two distributions, or the numbers of
observations which these areas represent, must be the same.

This condition does not, however, lead in any simple and
elementary algebraic way to an expression for y, though such
a value could be found arithmetically to any desired degree
of approximation. For it is evident that (1) any alteration in

302
        <pb n="329" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 303
Y, produces a proportionate alteration in the area of the curve,
e.g. doubling 7, doubles every ordinate 7, and therefore doubles
the area: (2) any alteration in o produces a proportionate
alteration in the area, for the values of y, are the same for the
same values of z/o, and therefore doubling o doubles the distance
of every ordinate from the mean, and consequently doubles the
area. The area of the curve, or the number of observations
represented, is therefore proportional to 7,0, or we must have

N=axy,o

where a is a numerical constant. The value of a may be found
approximately by taking 7, and o both equal to unity, calculating
the values of the ordinates y, for equidistant values of z, and
taking the area, or number of observations X, as given by the
sum of the ordinates multiplied by the interval.

11. The table below gives the values of y for values of x
proceeding by fifths of a unit ; the values are, of course, the same
for positive and negative values of z. For the whole curve the
sum of the ordinates will be found to be 1253318, the interval
being 0'2 units; the area is therefore, approximately, 2:50664,

SZ
Ordinates of the Curve y=e¢ 2. (For references to more extended
lables, see list on pp. 357-8.)

Log v. = Log v.
0 100000 0 26 "03405 253209
0-2 “98020 1-99131 28 ‘01984 229757
04 92312 126526 30 ‘01111 204567
hy ‘83527 192183 32 *00598 377641
72615 1-86103 "4 *00309 348978
60653 178285 3° ‘00153 318577
; "48675 168731 ood ‘00073 4-86439
. 87531 157439 40 *00034 452564
19 "27804 144410 4-2 *00015 4°16952
193 "19790 1:20644 44 ‘00006 579603

20 13534 113141 4-6 *00003 540516

2-2 08892 2794901 48 100001 699693

2:4 ‘05614 274923 50 00000 657132
and this is the approximate value of @. The value is more than
sufficiently accurate for practical purposes, for the exact value
is ~/2r=2506627..... The proof of this value cannot be given
here, but it may be deduced from an important approximate
expression for the factorials of large numbers, due to James

a in
Zz. Y. ,
        <pb n="330" />
        Te THEORY OF STATISTICS.
Stirling (1730). If n be large, we have, to a high degree of
approximation,
|n= 20m se
Applying Stirling’s theorem to the factorials in equation (1) we
have
Hy 5
heh ime 5)
The complete expression for the normal curve is therefore
FE
7 Nor. : (6)
The exponent may be written 22/c2 where c= v2.0, and this is
the origin of the use of 2 xo (the “modulus ”) as a measure
of dispersion, of 1/ 2.0 as a measure of “precision,” and of 20?
as “the fluctuation” (¢f. Chap. VIIL § 13). The use of the factor
2 or a/2 becomes meaningless if the distribution be not normal.

Another rule cited in Chap. VIIL, viz. that the mean deviation
is approximately 4/5 of the standard-deviation, is strictly true
for the normal curve only. For this distribution the mean
deviation =o N/2/r=0-79788 . . .. 0: the proof cannot be given
within the limitations of the present work. The rule that a
range of 6 times the standard-deviation includes the great
majority of the observations and that the quartile deviation is
about 2/3 of the standard-deviation were also suggested by the
properties of this curve (see below §§ 16, 17).

12. In the proof of § 9 the assumption was made that % (the
half of the exponent of the binomial) was very large compared
with # (any deviation that had to be considered). In point
of fact, however, the normal curve gives the terms of the
symmetrical binomial surprisingly closely even for moderate
values of n. Thus if »=064, k=32, and the standard-deviation
is 4. Deviations # have therefore to be considered up to +12
or more, which is over 1/3 of k As will be seen, however, from
the annexed table, the ordinates of the normal curve agree with
those of the binomial to the nearest unit (in 10,000 observations)
up to z= +15. The closeness of approximation is partly due
to the fact that, in applying the logarithmic series to the
fraction on the right of equation (3), the terms of the second
order in expansions of corresponding brackets in numerator and
denominator cancel each other: these terms, therefore, do not

304
        <pb n="331" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 305
accumulate, but only the terms of the third order. There is
only one second-order term that has been neglected, viz. that due
to the last bracket in the denominator. Even for much lower
values of n than that chosen for the illustration—e.g. 10 or 12
(cf. Qu. 4 at the end of this chapter)—the normal curve still
gives a very fair approximation.

TABLE showing (1) Ordinates of the Binomial Series 10,000 (3 + 3)% and
10,000 - 5
(2) Corresponding Ordinates of the Normal Curve Y=42r ef
inomi al Binomial Normal
Term, Pio ys Tera. Series. Curve.
32 993 997 24 and 40 196 135
31 and 33 963 967 23 ,, 41 ge 79
30 ,, 34 878 880 22, 42 a ad
29 ,, 3b 753 : 753 21 ,, 43 2:, 23
28 . 36 606 | cosh Cloo 41008 Ny :
3715 87 459 457 19° ,, 45
26 ,, 38 326 324 18 ,, 46
25 ,, 39 217 | 216 1, 4
13. But if the normal curve were limited in its application to
distributions which were certainly of binomial type, its use in
practice (apart from its theoretical applications to many cases of
the theory of sampling) would be very restricted. As suggested,
however, by the illustrations given in Chap. VI, a certain, though
not a large, number of distributions—more particularly among
those relating to measurements on man and other animals—are
approximately of normal form, even although such distributions
have not obviously originated in the same way as a binomial
distribution. Take, for example, the distribution of statures in
the United Kingdom (Chap. VI., Table VI.). The mean stature
is 67-46 inches, the standard-deviation 2-57 inches (the values are
worked out in the illustrations of Chaps. VII and VIIL), and the
number of observations 8585. This gives y,=1333, and all the
data necessary for plotting a normal curve of the same mean and
standard-deviation (the process of fitting is dealt with at greater
length in § 14 below). The two distributions are shown together
in fig. 49, the continuous curve being the normal curve, and the
small circles showing the observed frequencies. It is evident that
they agree very closely. Other body measurements, e.g. skull
measurements, etc., also follow the normal law ; it also applies to
certain characters in plants (e.g. number of seeds per capsule in
20)
        <pb n="332" />
        : THEORY OF STATISTICS.
Nelumbiwm, Pearl, American Naturalist, Nov. 1906). The question
arises,” therefore, why, in such cases, the distribution should be
approximately normal, a form of distribution which we have only
shown to arise if the variable is the sum of a large number of
elements, each of which can take the values 0 and 1 (or other two
constant values), these values occurring independently, and with
equal frequency.

In the first place, it should be stated that the conditions of the
deduction given in § 9 were made a little unnecessarily restricted,
BOO =. me mo,

. 7200,

3 |

~ 900+

3

0

a 6

)

3

3

300
0 Sy rd
56 38 60 62 ov 66 Wa 70 Wz WI NNUNs £0

Stature tn inches,

Fig. 49.—The Distribution of Stature for Adult Males in the British Isles
(fig. 6, p. 89), fitted with a Normal Curve: to avoid confusing the
figure, the frequency-polygon has not been drawn in, the tops of the
ordinates being shown by small circles.

with a view to securing simplicity of algebra. The deduction

may be generalised, whilst retaining the same type of proof, by

assuming that p and ¢ are unequal (provided p—g¢ be small
compared with Jpg, of. § 3), that p and ¢ are not quite the
same for all the events, that all the events are not quite inde-
pendent, or that » is not large, but that some sort of continuous
variation is possible in the values of the elementary variables,
these being no longer restricted to O and 1, or two other discrete
values. (Cf. the deduction given by Pearson in ref. 13.) Pro-
ceeding further from this last idea, the deduction may be rendered

306
        <pb n="333" />
        2
XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 307 1
¥ Bibl] Lk Bc
more general still, without introducing the concepti mot ina olek
binomial at all, by founding the curve on more or less complex
cases of the theory of sampling for variables instead of ¥ aftri- £5
butes. If a variable is the sum (or, within limits, some slightly(i~ &gt;
more complicated function) of a large number of other varia
then the distribution of the compound or resultant variable is
normal, provided that the elementary variables are independent,
or nearly so (¢f. ref. 6). The forms of the frequency-distribu-
tions of the elementary variables affect the final distribution less
and less as their number is increased: only if their number is
moderate, and the distributions all exhibit a comparatively high
degree of asymmetry of uniform sign, will the same sign of
asymmetry be sensibly evident in the distribution of the compound
variable. On this sort of hypothesis, the expectation of normality
in the case of stature may be based on the fact that it is a highly
compound character--depending on the sizes of the bones of the
head, the vertebral column, and the legs, the thickness of the
intervening cartilage, and the curvature of the spine—the elements
of which it is composed being at least to some extent independent,
v.e. by no means perfectly correlated with each other, and their
frequency-distributions exhibiting no very high degree of asym-
metry of one and the same sign. The comparative rarity of
normal distributions in economic statistics is probably due in part
to the fact that in most cases, while the entire causation is
certainly complex, relatively few causes have a largely predominant
influence (hence also the frequent occurrence of irregular
distributions in this field of work), and in part also to a high
degree of asymmetry in the distributions of the elements on which
the compound variable depends. Errors of observation may in
general be regarded as compounded of a number of elements, due
to various causes, and it was in this connection that the normal
curve was first deduced, and received its name of the curve of
errors, or law of error.
14. If it be desired to compare some actual distribution
with the normal distribution, the two distributions should be
superposed on one diagram, as in fig. 49, though, of course, on
a much larger scale. When the mean and standard-deviation
of the actual distribution have been determined, Y, 1s given by
equation (5); the fit will probably be slightly closer if the
standard-deviation is adjusted by Sheppard’s correction (Chap.
XI. § 4). The normal curve is then most readily drawn by plot-
ting a scale showing fifths of the standard-deviation along the
base line of the frequency diagram, taking ‘the mean as origin,
and marking over these points the ordinates given by the figures
of the table on p. 303, multiplied in each case by ,- The curve
        <pb n="334" />
        THEORY OF STATISTICS.

can be drawn freehand, or by aid of a curve ruler, through the
tops of the ordinates so determined. The logarithms of # in the
table on p. 303 are given to facilitate the multiplication. The only
point in which the student is likely to find any difficulty is
in the use of the scales: he must be careful to remember
that the standard-deviation must be expressed in terms of the
class-interval as a wnat in order to obtain for y, a number of
observations per interval comparable with the frequencies of his
table.

The process may be varied by keeping the normal curve
drawn to one scale, and redrawing the actual distribution
80 as to make the area, mean, and standard-deviation the
same. Thus suppose a diagram of a normal curve was printed
once for all to a scale, say, of y,=5 inches, o=1 inch, and
it were required to fit the distribution of stature to it.
Since the standard-deviation is 2-57 inches of stature, the
scale of stature is 1 inch =2'57 inch of stature, or 0:389 inches
=1 inch of stature ; this scale must be drawn on the base of the
normal-curve diagram, being so placed that the mean falls
at 67-46. As regards the scale of frequency-per-interval, this
is given by the fact that the whole area of the polygon showing
the actual distribution must be equal to the area of the
normal curve, that is 5 «/2r=1253 square inches. If, therefore,
the scale required is n= observations per interval to the inch,
we have, the number of observations being 8585,

8585
nx 2:57 RRs
which gives n= 266-6.

Though the second method saves curve drawing, the first,
on the whole, involves the least arithmetic and the simplest
plotting.

15. Any plotting of a diagram, or the equivalent arithmetical
comparison of actual frequencies with those given by the
fitted normal distribution, affords, of course, in itself, only a
rough test, of a practical kind, of the normality of the given
distribution. The question whether all the observed differences
between actual and calculated frequencies, taken together,
may have arisen merely as fluctuations of sampling, so that the
actual distribution may be regarded as strictly normal, neglecting
such errors, is a question of a kind that cannot be answered in
an elementary work (cf. ref. 22). At present the student is in
a position to compare the divergences of actual from calculated
frequencies with fluctuations of sampling in the case of single
class-intervals, or single groups of class-intervals only. If the

308
        <pb n="335" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 309
expected theoretical frequency in a certain interval is f, the
standard error of sampling is /A(N —f)/N ; and if the divergence
of the observed from the theoretical frequency exceed some
three times this standard error, the divergence is unlikely to
have occurred as a mere fluctuation of sampling.

It should be noted, however, that the ordinate of the normal
curve at the middle of an interval does not give accurately the
area of that interval, or the number of observations within it: it
would only do so if the curve were sensibly straight. To deal
strictly with problems as to fluctuations of sampling in the
frequencies of single intervals or groups of intervals, we require,
accordingly, some convenient means of obtaining the number of
observations, in a given normal distribution, lying between any
two values of the variable.

16. If an ordinate be erected at a distance z/o from the mean,
in a normal curve, it divides the whole area into two parts, the
ratio of which is evidently, from the mode of construction of the
curve, independent of the values of y, and of o. The calculation
of these fractions of area for given values of z/s, though a long
and tedious matter, can thus be done once for all, and a table
giving the results is useful for the purpose suggested in § 15 and
in many other ways. References to complete tables are cited at
the end of this work (list of tables, pp. 357-8), the short table below
being given only for illustrative purposes. The table shows the
greater fraction of the area lying on one side of any given ordinate ;
e.g. 0'53983 of the whole area lies on one side of an ordinate at
0-1c from the mean, and 046017 on the other side. It will be
seen that an ordinate drawn at a distance from the mean equal to
the standard-deviation cuts off some 16 per cent. of the whole
area on one side ; some 68 per cent. of the area will therefore be
contained between ordinates at +o. An ordinate at twice the
standard-deviation cuts off only 2:3 per cent., and therefore some
954 per cent. of the whole area lies within a range of +20. At
three times the standard-deviation the fraction of area cut off is
reduced to 135 parts in 100,000, leaving 997 per cent. within a
range of +30. This is the basis of our rough rule that a range
of 6 times the standard-deviation will in general include the
great bulk of the observations: the rule is founded on, and is only
strictly true for, the normal distribution. For other forms of
distribution it need not hold good, though experience suggests
that it more often holds than not. The binomial distribution,
especially if p and ¢ be unequal, only becomes approximately normal
when 7 is large, and this limitation must be remembered in applying
the table given, or similar more complete tables, to cases in which
the distribution is strictly binomial.
        <pb n="336" />
        : THEORY OF STATISTICS.

TABLE showing the Greater Fraction of the Area of a Normal Curve to One
Side of an Ordinate of Abscissa ja. (For references to more extended
tables, see list on pp. 857-8.)

Greater Greater
zlo. Fraction of r/o. Traction of

Arca. Area.
0 50000 2-1 "98214
0-1 53983 2-2 ‘98610
0-2 57926 | 2:3 98928
0-3 61791 24 99180
0-4 ‘65542 245 *99379
95 69146 So 99534
06 *72575 A. 99653
0-7 "75804 ew 3 99744
0-8 "78814 29 99813
09 '81594 30 99865
1-0 "84134 Ee] 99903
1:1 86433 2 99931
1:2 *88493 “3 99952
1:3 90320 ot4 99966
14 91924 Lo) 99977
15. ‘93319 2:5 | 99984
1:6 94520 57 99989
17 95543 38 199993
1-8 "96407 39 299995
114) ‘97128 4:0 £99997
2:0 "97725 4-1 99998

17. If we try to determine the quartile deviation in terms of
the standard-deviation from the table, we see that it lies between
0:6 and 070. Interpolating, it is given approximately by
2425 |
{0640 1559 po=0 6750.
More exact interpolation gives the value 0°674489750. This result,
again, is the foundation of the rough rule that the semi-inter-
quartile range is usually some 2/3 of the standard-deviation : it is
strictly true for the normal curve only. It may be noted that
the constant 067448975 . . . . can be determined by processes of
interpolation only, and cannot be expressed exactly, like the
mean deviation, in terms of any other known constant, such

as .

It has become customary to use 0:674 . . . . times the standard
error rather than the standard error itself as a measure of the

310
        <pb n="337" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 311
unreliability of observed statistical results, and the term probable
error is given to this quantity. It should be noted that the word
“probable” is hardly used in its usual sense in this connection :
the probable error is merely a quantity such that we may expect
greater and less errors of simple sampling with about equal
frequency, provided always that the distribution of errors is
normal. On the whole, the use of the ‘probable error” has little
advantage compared with the standard, and consequently little
stress is laid on it in the present work ; but the term is in constant
use, and the student must be familiar with it.

It is true that the “ probable error ” has a simpler and more direct
significance than the standard error, but this advantage is lost as
soon as we come to deal with multiples of the probable error.
Further, the best modern tables of the ordinates and area of the
normal curve are given in terms of the standard-deviation or
standard error, not in terms of the probable error, and the mul-
tiplication of the former by 0:6745, to obtain the probable error,
is not justified unless the distribution is normal. For very large
samples the distribution is approximately normal, even though p
and ¢ are unequal ; but this is not so for small samples, such as
often occur in practice. In the case of small samples the use of
the “probable error” is consequently of doubtful value, while the
standard error retains its significance as a measure of dispersion.
The ¢ probable error,” it may be mentioned, is often stated after
an observed proportion with the + sign before it; a percentage
given as 205 + 2-3 signifying “20'5 per cent., with a probable
error of 2'3 per cent.”

If an error or deviation in, say, a certain proportion p only just
exceed the probable error, it is as likely as not to occur in simple
sampling : if it exceed twice the probable error (in either direction),
it is likely to occur as a deviation of simple sampling about 18
times in 100 trials—or the odds are about 4'6 to 1 against its
occurring at any one trial. For a range of three times the probable
error the odds are about 22 to 1, and for a range of four times the
probable error 142 to 1. Until a deviation exceeds, then, 4 times
the probable error, we cannot feel any great confidence that it is
likely to be “significant.” Itis simpler to work with the standard
error and take + 3 times the standard error as the critical range :
for this range the odds are about 370 to 1 against such a devia-
tion occurring in simple sampling at any one trial.

18. The following are a few miscellaneous examples of the use
of the normal curve and the table of areas.

Example i.—A hundred coins are thrown a number of times.
How often approximately in 10,000 throws may (1) exactly 65
heads, (2) 65 heads or more, be expected §
        <pb n="338" />
        312 THEORY OF STATISTICS.

The standard-deviation is /0'5 x 0-5 x 100 = 5. Taking the
distribution as normal, y,= 797-9.

The mean number of heads being 50, 65 —50=30c. The
frequency of a deviation of 3¢ is given at once by the table (p. 303)
as 7979 x 0111 . . . . =8'86, or nearly 9 throws in 10,000. A
throw of 65 heads will therefore be expected about 9 times.

The frequency of throws of 65 heads or more is given by the
area table (p. 310), but a little caution must now be used, owing
to the discontinuity of the distribution. A throw of 65 heads is
equivalent to a range of 64'5-65'5 on the continuous scale of the
normal curve, the division between 64 and 65 coming at 64°5.
64:5 — 50 = + 2:90, and a deviation of + 2:9. or more, will only
occur, as given by the table, 187 times in 100,000 throws, or, say,
19 times in 10,000.

Fxample ii.—Taking the data of the stature-distribution of fig.
49 (mean 67°46, standard-deviation 2-57 in.), what proportion of
all the individuals will be within a range of + 1 inch of the
mean {

1 inch =0-3890. Simple interpolation in the table of p. 310
gives 0'65129 of the area below this deviation, or a more extended
table the more accurate value 0:65136. Within a range of
+ 0°389¢ the fraction of the whole area is therefore 0:30272, or the
statures of about 303 per thousand of the given population will lie
within a range of +1 inch from the mean.

Example iii.—In a case of crossing a Mendelian recessive by a
heterozygote the expectation of recessive offspring is 50 per cent.
(1) How often would 30 recessives or more be expected amongst 50
offspring owing simply to fluctuations of sampling? (2) How many
offspring would have to be obtained in order to reduce the probable
error to 1 per cent. ?

The standard error of the percentage of recessives for 50
observations is 50 A/ 1/50 =7'07. Thirty recessives in fifty is
a deviation of 5 from the mean, or, if we take thirty as representing
29'5 or more, 4'5 from the mean; that is, 0'636.0. A positive
deviation of this amount or more occurs about 262 times in 1000,
so that 30 recessives or more would be expected in more than a
quarter of the batches of 50 offspring. We have assumed
normality for rather a small value of n, but the result is sufficiently
accurate for practical purposes.

As regards the second part of the question we are to have

"6745 x 50 /1/n=1,
n being the number of offspring. This gives n=1137 to the
nearest unit.
        <pb n="339" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 313
Example iv.—The diagram of fig. 49 shows that the number of
statures recorded in the group “62 in. and less than 63” is
markedly less than the theoretical value. Could such a difference
occur owing to fluctuations of simple sampling; and if so, how
often might it happen ?

The actual frequency recorded is 169. To obtain the theoreti-
cal frequency we may either take it as given roughly by the
ordinate in the centre of the interval, or, better, use the integral
table. Remembering that statures were only recorded to the
nearest % in., the true limits of the interval are 6115-6212 or
61:94-62'94, mid-value 62:44. This is a deviation from the
mean (67°46) of 5°02. Calculating the ordinate of the normal
curve directly we find the frequency 197-8. This is certainly, as
is evident from the form of the curve, a little too small. The
interval actually lies between deviations of 4:52 in. and 552
in., that is, 17590 and 2:1480. The corresponding fractions of
area are 0'96071 and 0-98418, difference, or fraction of area
between the two ordinates, 0:02347. Multiplying this by the
whole number of observations (8585) we have the theoretical
frequency 201-5.

The difference of theoretical and observed frequencies is therefore
32:5. But the proportion of observations which should fall into
the given class is 0023, the proportion falling into other classes
0-977, and the standard error of the class frequency is accordingly
0-023 x 0977 x 8585 =14'0. As the actual deviation is only
2:32 times this, it could certainly have occurred as a fluctuation of
sampling.

The question how often it might have occurred can only be
answered if we assume the distribution of fluctuations of sampling
to be approximately normal. It is true that 2 and gq are very
unequal, but then =z is very large (8585)—so large that the
difference of the chances is fairly small compared with npg
(about one-fifteenth). Hence we may take the distribution of
errors as roughly normal to a first approximation, though a
first approximation only. The tables give 0-990 of the area
below a deviation of 232s, so we would expect an equal or
greater deficiency to occur about 10 times in 1000 trials, or once
in a hundred.

REFERENCES.
The Binomial Machine.

(1) GavroN, FraNcis, Natural Inheritance ; Macmillan &amp; Co. London, 1889,
(Mechanical method of forming a binomial or normal distribution,
SREP ya p. 63; for Pearson’s generalised machine, see below,
rel. .
        <pb n="340" />
        THEORY OF STATISTICS.
Frequency Curves.

For the early classical memoirs on the normal curve or law of error
by Laplace, Gauss, and others, see Todhunter’s History (Introduction :
ref. 7). The literature of this subject is too extensive to enable us to do
more than cite a few of the more recent memoirs, of which 6, 7, and 13
are of fundamental importance. The student will find other citations
in 6, 8, and 14.

(2) CuARLIER, C. V. L., ‘‘Researches into the Theory of Probability”
(Communications from the Astronomical Observatory, Lund); Lund,
1906.

(3) CuxNiNGHAM, E., “The o-Functions, a Class of Normal Functions
occurring in Statistics,” Proc. Roy. Soc., Series A, vol. 1xxxi., 1908,
p- 310.

(4) EpcEwoRTH, F. Y., ‘On the Representation of Statistics by Mathema-
tical Formule,” Jour. Roy. Stat. Soc., vol. 1xi., 1898 ; vol. Ixii., 1899 ;
and vol. Ixiii., 1900.

(5) EpcEwortH, F. Y., Article on the ‘‘ Law of Error” in the Encyclopedia
Britannica, 10th edn., vol, xxviii., 1902, p. 280.

(6) EpcEworrn, F. Y., ‘The Law of Error,” Cambridge Phil. Trans., vol.
xx., 1904, pp. 36-65, 113-141 (and an appendix, pp. i-xiv, not
printed in the Cambridge Phil. Trans.).

(7) EpcEworrH, F. Y., “The Generalised Law of Error, or Law of Great
Numbers,” Jour. Roy. Stat. Soc., vol. 1xix., 1906, p. 497.

(8) EpcEworTH, F. Y., “On the Representation of Statistical Frequency by
a Curve,” Jour. Roy. Stat. Soc., vol. 1xx., 1907, p. 102.

(9) FECHNER, G. T., Kollektivmassichre (herausgegeben von G. I. Lipps;
Engelmann, Leipzig, 1897.)

(10) KAPTEYN, J. C., Skew Frequency Curves in Biology and Statistics ;
Noordhoff, Groningen ; Wm. Dawson &amp; Sons, London, 1903.

(11) MACALISTER, DONALD, ‘‘ The Law of the Geometric Mean,” Proc. Roy.
Soc., vol. xxix., 1879, p. 367.

(12) N1xow, J. W., ‘“An Experimental Test of the Normal Law of Error,”
Jour. Roy. Stat. Soc., vol. 1xxvi., 1913, pp. 702-706.

(13) PEARsoN, KARL, ‘Skew Variation in Homogeneous Material,” Phil.
Trans. Roy. Soc., Series A, vol. clxxxvi., 1895, p. 343.

For the generalised binomial machine, see § 1. The memoir deals
with curves derived from the general binomial, and from a somewhat
analogous series derived from the case of sampling from limited
material. Supplement to the memoir, 4bid., vol. cxevii., 1901, p. 443.
For a derivation of the same curves from a modified standpoint,
ignoring the binomial and analogous distributions, ¢f. Chap. X.,
ref. 18.

(14) PEARsoN, KARL, ‘‘Das Fehlergesetz und seine Verallgemeinerungen
durch Fechner und Pearson”: A Rejoinder, Biometrika, vol. iv., 1905,

. 169.

(15) Pio Luict, * Nuove Applicazioni del Calcolo delle Probabilita allo
Studio dei Fenomeni Statistici e Distribuzione dei Matrimoni secondo
Eth degli Sposi,” Mem. della Classe di Scienze morals, etc., Reale
Accad, dei Lincet, vol. x., Series 3, 1882.

(16) SEEPPARD, W. F., “On the Application of the Theory of Error to Cases
of Normal Distribution and Normal Correlation,” Phil. Trans. Roy.
Soc., Series A, vol, cxcii., 1898, p. 101. (Includes a geometrical treat-
ment of the normal curve.)

(17) YULE, G. U., “On the Distribution of Deaths with Age when the Causes
of Death act cumulatively, and similar Frequency-distributions,”

214
        <pb n="341" />
        XV.—BINOMIAL DISTRIBUTION AND NORMAL CURVE. 315
Jour. Roy. Stat. Soc., vol. Ixxiii., 1910, p. 26. (A binomial distribu-
tion with negative index, and the related curve, i.e. a special case of
one of Pearson's curves, ref. 13.)
The Resolution of a Distribution compounded of two Normal
Curves into its Components.

(18) PEARsoN, KARL,“ Contributions to the Mathematical Theory of Evolu-

tion (on the Dissection of Asymmetrical Frequency Curves),” Phil.
Trans. Roy. Soc., Series A, vol. clxxxv., 1894, PZ

(19) Epceworrn, F. Y., “On the Representation of Statistics by Mathema-
tical Formule,” part ii., Jour. Roy. Stat. Sec., vol. Ixii., 1899, p. 125.

(20) PearsoN, KARL, “On some Applications of the Theory of Chance to
Racial Differentiation,” Phil. Jay. 6th Series, vol. i., 1901, p. 110.

(21) HELGUERO, FERNANDO DE, *‘ Per la risoluzione delle curve dimorfiche,”
Biometrika, vol. iv., 1905, p. 230. Also memoir under the same title
in the Transactions of the Reale Accademia dei Lincei, Rome, vol. vi.,
1906. (The first is a short note, the second the full memoir, )

See also the memoir by Charlier, cited in (2), section vi. of that
memoir dealing with the problem of dissection.
Testing the Fit of an Observed to a Theoretical or
another Observed Distribution.

(22) PEARSON, KARL, “On the Criterion that a given System of Deviations
from the Probable, in the Case of a Correlated System of Variables, is
such that it can be reasonably supposed to have arisen from random
sampling,” Phil. Mag., 5th Series, vol. 1., 1900, p- 157.

(23) Pearson, KARL, “On the Probability that Two Independent Distribu-
tions of Frequency are really Samples from the same Population,”
Biometrika, vol. viil., 1911, p- 250 ; also Biometrika, vol. x., 1914,
pp. 85-143,

EXERCISES.

1. Calculate the theoretical distributions for the three experimental cases
(1), (2), and (8) cited in § 7 of Chapter XIII.

2. Show that if np be a whole number, the mean of the binomial coincides
with the greatest term.

3. Show that if two symmetrical binomial distributions of degree n (and
of the same number of observations) are so superposed that the rth term of
the one coincides with the (r+1)th term of the other, the distribution
formed by adding superposed terms is a symmetrical binomial of degree n+ 1.

[Note : it follows that if two normal distributions of the same area and
standard-deviation are superposed so that the difference between the means is
small compared with the standard-deviation, the compound curve is very
nearly normal. ]

4. Culculate the ordinates of the binomial 1024 (05+ 05)", and compare
them with those of the normal curve.

5. Draw a diagram showing the distribution of statures of Cambridge
students (Chap. VI., Table VII), and a normal curve of the same area,
mean, and standard-deviation superposed thereon.

6. Compare the values of the semi-interquartile range for the stature
distributions of male adults in the United Kingdom and Cambridge students,
(1) as found directly, (2) as calculated from the standard-deviation, on the
assumption that the distribution is normal.
        <pb n="342" />
        So THEORY OF STATISTICS.

7. Taking the mean stature for the British Isles as 67°46 in. (the dis-
tribution of fig. 49), the mean for Cambridge students as 68:85 in., and the
common standard-deviation as 2:56 in., what percentage of Cambridge students
exceed the British mean in stature, assuming the distribution normal %

8. As stated in Chap. X1l[., Example ii., certain crosses of Pisum sativum
based on 7125 seeds gave 25°32 per cent. of green seeds instead of the theoretical
proportion 25 per cent., the standard error being 0°51 per cent. In what per-
centage of experiments based on the same number of seeds might an equal or
greater percentage be expected to occur owing to fluctuations of sampling
alone?

9. In what proportion of similar experiments based on (1) 100 seeds, (2)
1000 seeds, might (a) 30 per cent. or more, (5) 35 per cent. or more, of green
seeds, be expected to occur, if ever ?

10. In similar experiments, what number of seeds must be obtained to
make the ‘¢ probable error ” of the proportion 1 per cent. ?

11. If skulls are classified as dolichocephalic when the length-breadth
index is under 75, mesocephalic when the same index lies between 75 and 80,
and brachycephalic when the index is over 80, find approximately (assuming
that the distribution is normal) the mean and standard-deviation of a series
in which 58 per cent. are stated to be dolichocephalic, 38 per cent. meso-
cephalic, and 4 per cent. brachycephalic,

216
        <pb n="343" />
        CHAPTER XVI,
NORMAL CORRELATION.

1-3. Deduction of the general expression for the normal correlation surface
from the case of independence—4. Constancy of the standard-
deviations of parallel arrays and linearity of the regression—5. The
contour lines: a series of concentric and similar ellipses—6. The
normal surface for two correlated variables regarded as a normal
surface for uncorrelated variables rotated with respect to the axes of
measurement : arrays taken at any angle across the surface are normal
distributions with constant standard-deviation : distribution of and
correlation between linear functions of two normally correlated
variables are normal : principal axes—7. Standard-deviations round
the principal axes—8-11. Investigation of Table II1., Chap. IX., to
test normality : linearity of regression, constancy of standard-deviation
of arrays, normality of distribution obtained by diagonal addition,
contour lines—12-13. Isotropy of the normal distribution for two
variables—14. Outline of the principal properties of the normal dis-
tribution for n variables.

1. THE expression that we have obtained for the “normal” dis-

tribution of a single variable may readily be made to yield a

corresponding expression for the distribution of frequency of pairs

of values of two variables. This normal distribution for two
variables, or “normal correlation surface,” is of great historical
importance, as the earlier work on correlation is, almost with-
out exception, based on the assumption of such a distribution ;
though when it was recognised that the properties of the correla-
tion-coeflicient could be deduced, as in Chap. IX., without reference
to the form of the distribution of frequency, a knowledge of
this special type of frequency-surface ceased to be so essential.
But the generalised normal law is of importance in the theory of
sampling : it serves to describe very approximately certain actual
distributions (e.g. of measurements on man) ; and if it can be
assumed to hold good, some of the expressions in the theory of
correlation, notably the standard-deviations of arrays (and, if
more than two variables are involved, the partial correlation-
coefficients), can be assigned more simple and definite meanings
than in the general case. The student should, therefore, be
familiar with the more fundamental properties of the distribution.
317
        <pb n="344" />
        : THEORY OF STATISTICS.

2. Consider first the case in which the two variables are com
pletely independent. Let the distributions of frequency for the
two variables x; and x,, singly, be

4
Y1="¢ 22
==
(1)
2 |
Fy
Yo=Ys¢ kr
Then, assuming independence, the frequency-distribution of pairs
of values must, by the rule of independence, be given by
2 2
33) ®
Yio Yee Eo
whera
oy
EX 2.00, i (3)
Equation (2) gives a normal correlation surface for one special
case, the correlation-coefficient being zero. If we put xz,=a con-
stant, we see that every section of the surface by a vertical plane
parallel to the z; axis, 7.e. the distribution of any array of a;’s, is
a normal distribution, with the same mean and standard-deviation
as the total distribution of z’s, and a similar statement holds for
the array of a,’s; these properties must hold good, of course, as
the two variables are assumed independent (cf. Chap. V. § 13).
The contour lines of the surface, that is to say, lines drawn on
the surface at a constant height, are a series of similar ellipses
with major and minor axes parallel to the axes of x; and «, and
proportional to o; and oy, the equations to the contour lines being
of the general form
z |,
Tr a : od)
Pairs of values of 2, and x, related by an equation of this form
are, therefore, equally frequent.

3. To pass from this special case of independence to the general
case of two correlated variables, remember (Chap. XII. § 8)
that if

yg =1y — by.

yy = y= by;
x, and x, ,, as also z, and «,, are uncorrelated. If they are not
merely uncorrelated but completely independent, and if the dis-

318
        <pb n="345" />
        XVIL.—NORMAL CORRELATION. 219
tribution of each of the deviations singly be normal, we must have
for the frequency-distribution of pairs of deviations of x; and z,.,

Vig =Y10 gy 93) A
IER
xf | 3, a; x; o Ty
St = Frag my ni
or 031 oi(l—-1}) oy(1 —1},) oyoa(1 — 77)
2 2
Ba 2.
PR
Rk Ojot-05y 912021
Evidently we would also have arrived at precisely the same
expression if we had taken the distribution of frequency for z,
and z, ,, and reduced the exponent
Oz: Oi»
We have, therefore, the general expression for the normal
correlation surface for two variables
2 2
x x 2s
SC NE. (6)
’ a go, 2 :
Yi2= Yat 1.2 21 1.2 21
Further, since #, and ,.,, z, and 7.9, are independent, we must
have
ie YW h .
Y= 27.0100; 27.0000, 2mo, op(1 — ri) - (7)
4. Tf we assign to x, some fixed value, say h, we have the
distribution of the array of x,’s of type A,
( = Le op i )
hy ols oh; of 2721
Y12=Yr-e :
a1 z
‘2 (= lt)
= Vine 3 207
This is a normal distribution of standard-deviation 01.0 With a
—r o Tp
mean deviating by r,, hy from the mean of the whole distribu-
2
tion of z’s. As A, represents any value whatever of z,, we see
(1) that the standard-deviations of all arrays of x, are the same,

aki :
(v
Buu ’
i
        <pb n="346" />
        &lt;. THEORY OF STATISTICS.
and equal to oy, (2) that the regression of 2) on z, is strictly
linear. Similarly, of course, if we assign to x; any value %,, we
will find (1) that the standard-deviations of all arrays of z, are
the same: (2) that the regression of xz, on z; is strictly linear.
n Axes of Measurement Xx
3 M = Mean of whole surface
— and is also the summit of
the surface
RR.CC.-Lines of means
ow»
Contour lines and Axes of
normal correlation surface
Yi
Fic. 50.— Principal Axes and Contour Lines of the normal
Correlation Surface.

5. The contour lines are, as in the case of independence, a
series of concentric and similar ellipses; the major and minor
axes are, however, no longer parallel to the axes of 2; and z,, but
make a certain angle with them. Fig. 50 illustrates the calcu-
lated form of the contour lines for one case, RR and CC being
the lines of regression. As each line of regression cuts every

320
        <pb n="347" />
        XVIL.—NORMAL CORRELATION. |
array of z, or of z, in its mean, and as the distribution of every
array is symmetrical about its mean, RR must bisect every
horizontal chord and CC every vertical chord, as illustrated
by the two chords shown by dotted lines: it also follows that
RR cuts all the ellipses in the points of contact of the horizontal
tangents to the ellipses, and CC in the points of contact of
the vertical tangents. The surface or solid itself, somewhat
truncated, is shown in fig. 29, p. 166.

6. Since, as we see from fig. 50, a normal surface for two
correlated variables may be regarded merely as a certain surface
for which » is zero turned round through some angle, and since
for every angle through which it is turned the distributions of all
x, arrays and x, arrays are normal, it follows that every section
of a normal surface by a vertical plane is a normal curve, ze. the
distributions of arrays taken at any angle across the surface are
normal. It also follows that, since the total distributions of x
and x, must be normal for every angle though which the surface
is turned, the distributions of totals given by slices or arrays
taken at any angle across a normal surface must be normal
distributions. Rut these would give the distributions of functions
like a.z,+b.x, and consequently (1) the distribution of any
linear function of two normally distributed variables x; and z,
must also be normal ; (2) the correlation between any two linear
functions of two normally distributed variables must be normal
correlation.

To find the angle § through which the surface has been turned,
from the position for which the correlation is zero to the position
for which the coefficient has some assigned value r, we must use
a little trigonometry. The major and minor axes of the ellipses
are sometimes termed the principal axes. If &amp;, &amp; be the co-
ordinates referred to the principal axes (the &amp;-axis being the
x, axis in its new position) we have for the relation between £5
&amp;y xy, x, the angle 6 being taken as positive for a rotation of
the z-axis which will make it, if continued through 90° coincide
in direction and sense with the z-axis,

§ =x,. cos 0+x,. sin 0 8)

§y=a,. cos 0 — z,. sin 6} (

But, since ¢; &amp;, are uncorrelated, 2(£,¢6,)=0. Hence, multiplying
together equations (8) and summing,
0= (03 - 03) sin 26 + 25.00, cos 20
27.000,
tan 26 = He

321
(9)
01
        <pb n="348" />
        oo 2 THEORY OF STATISTICS.

It should be noticed that if we define the principal axes of any
distribution for two variables as being a pair of axes at right
angles for which the variables &amp;, &amp;, are uncorrelated, equation
(9) gives the angle that they make with the axes of measurement
whether the distribution be normal or no.

7. The two standard-deviations, say 2; and 2, about the
principal axes are of some interest, for evidently from § 2 the
major and minor axes of the contour-ellipses are proportional
to these two standard-deviations. They may be most readily
determined as follows. Squaring the two transformation equations
(8), summing and adding, we have

212 =0+ 03 JX (10)
Referring the surface to the axes of measurement, we have for
the central ordinate by equation (7)

, J

y 12 = 2051 = 72)

Referring it to the principal axes, by equation (3)

Ta

120 ron

But these two values of the central ordinate must be equal,
therefore

D= ay05(1 CE 5)’ (11)
(10) and (11) are a pair of simultaneous equations from which
2, and Z, may be very simply obtained in any arithmetical case.
Care must, however, be taken to give the correct signs to the
square root in solving. 2; +2, is necessarily positive, and 2, — 2,
also if 7 is positive, the major axes of the ellipses lying along &amp;; :
but if » be negative, 3; — 2, is also negative. It should be noted
that, while we have deduced (11) from a simple consideration
depending on the normality of the distribution, it is really of
general application (like equation 10), and may be obtained at
somewhat greater length from the equations for transforming
co-ordinates.

8. As stated in Chap. XV. § 13, the frequency-distribution
for any variable may be expected to be approximately normal
if that variable may be regarded as the sum (or, within limits,
some slightly more complex function) of a large number of other
variables, provided that these elementary component variables
are independent, or nearly so. Similarly, the correlation between
two variables may be expected to be approximately normal if

3929
        <pb n="349" />
        XVL.—NORMAL CORRELATION. )
each of the two variables may be regarded as the sum, or some
slightly more complex function, of a large number of elementary
component variables, the intensity of correlation depending on
the proportion of the components common to the two variables.

Stature is a highly compound character of this kind, and we
have seen that, in one instance at least, the distribution of stature
for a number of adults is given approximately by the normal
curve. We can now utilise Table IIL., Chap. IX, p. 160, showing
the correlation between stature of father and son, to test, as far
as we can by elementary methods, whether the normal surface
will fit the distribution of the same character in pairs of indi-
viduals : we leave it to the student to test, as far as he can do so
by simple graphical methods, the approximate normality of the
total distributions for this table. The first important property
of the normal distribution is the linearity of the regression.
This was well illustrated in fig. 37, p. 174, and the closeness of
the regression to linearity was confirmed by the values of
the correlation-ratios (p. 206), viz, 0-52 in each case as com-
pared with a correlation of 0-51. Subject to some investiga-
tion as to the possibility of the deviations that do occur
arising as fluctuations of simple sampling, when drawing
samples from a record for which the regression is strictly
linear, we may conclude that the regression is appreciably
linear,

9. The second important property of the normal distribution
for two variables is the constancy of the standard-deviation for
all parallel arrays. We gave in Chap. X. p. 204 the standard-
deviations of ten of the columns of the present table, from the
column headed 62:5-635 onwards ; these were—

2:56 2:60

2°11 2:26

2-55 2-26

2-24 245

2:23 2:33
the mean being 2:36. The standard-deviations again only fluctuate
irregularly round their mean value. The mean of the first five
is 2:34, of the second five 2-38, a difference of only 0:04: of the
first group, two are greater and three are less than the mean,
and the same is true of the second group. There does not seem
to be any indication of a general tendency for the standard-
deviation to increase or decrease as we pass from one end of the
table to the other. We are not yet in a position to test how
far the differences from the average standard deviation might
arise in sampling from a record in which the distribution was

3925
        <pb n="350" />
        : THEORY OF STATISTICS.
strictly normal, but, as a fact, a rough test suggests that they
might have done so.

10. Next we note that the distributions of all arrays of a
normal surface should themselves be normal. Owing, however,
to the small numbers of observations in any array, the distributions
of arrays are very irregular, and their normality cannot be tested
in any very satisfactory way: we can only say that they do not
exhibit any marked or regular asymmetry. But we can test the
allied property of a normal correlation-table, viz. that the totals
of arrays must give a normal distribution even if the arrays be
taken diagonally across the surface, and not parallel to either
axis of measurement (cf. § 6). From an ordinary correlation-
table we cannot find the totals of such diagonal arrays exactly,
but the totals of arrays at an angle of 45° will be given with
sufficient accuracy for our present purpose by the totals of lines
of diagonally adjacent compartments. Referring again to Table
III, Chap. IX., and forming the totals of such diagonals (running
up from left to right), we find, starting at the top left-hand
corner of the table, the following distribution :—

0-25 78°75
2 81-25
3.25 665
6-25 5925
8 42-25
9-75 30-75
17 29-25
345 19
42 10°75
46°25 1
605 4-25
67-5 3:5
85°75 1-75
87:25 1
78 0-25
94-25 —
Total 1078
The mean of this distribution is at 0359 of an interval above the
centre of the interval with frequency 78: its standard-deviation
is 4757 intervals, or, remembering that the interval is 1/,/2 of
an inch, 3:364 inches. (This value may be checked directly from
the constants for the table given in Chap. IX., Question 3, p. 189,
for we have from the first of the transformation equations (8),
03 =o07. cos’ 0+ a3 sin’ 0 + 2ry,00,. sin 6 cos 0,

294.
        <pb n="351" />
        Xv1L.—NORMAL CORRELATION. 25
and inserting o,=272, 0y= 275, r,;=051, sin f=cos §=1//2
find oz=3'361). Drawing a diagram and fitting a normal
curve we have fig. 51 ; the distribution is rather irregular but the
fit is fair ; certainly there is no marked asymmetry, and, so far as
the graphical test goes, the distribution may be regarded as
appreciably normal. One of the greatest divergences of the
actual distribution from the normal curve occurs in the almost
central interval with frequency 78: the difference between the
observed and calculated frequencies is here 12 units, but the
standard error is 9'1, so that it may well have occurred as a
fluctuation of simple sampling.

LA : — .

8¢
k
Fie. 51.—Distribution of Frequency obtained by addition of Table III.,

Chap. IX., along Diagonals running up from left to right, fitted with a
Normal Curve.

11. So far, we have seen (1) that the regression is approxi-
mately linear; (2) that, in the arrays which we have tested, the
standard-deviations are approximately constant, or at least that
their differences are only small, irregular and fluctuating ; (3) that
the distribution of totals for one set of diagonal arrays is approxi-
mately normal. These results suggest, though they cannot
completely prove, that the whole distribution of frequency may
be regarded as approximately normal, within the limits of fluctu-
ations of sampling. We may therefore apply a more searching
test, viz. the form of the contour lines and the closeness of their
fit to the contour-ellipses of the normal surface. We can see at
once, however, that no very close fit can be expected. Since the
frequencies in the compartments of the table are small, the
standard error of any frequency is given approximately by its

acy
        <pb n="352" />
        2a THEORY OF STATISTICS.

square root (Chap. XIII. § 12), and this implies a standard error
of about 5 units at the centre of the table, 3 units for a frequency
of 9, or 2 units for a frequency of 4: such fluctuations might
cause wide divergences in the corresponding contour lines.

Using the suffix 1 to denote the constants relating to the
distribution of stature for fathers, and 2 the same constants for
the sons,

4=1078 M,=6770 M,= 6866 i
c= ooh ot Tol i. = OEE

Hence we have from equation (7)

¥15=267
and the complete expression for the fitted normal surface is
y= 00 Jo SEER EEE

The equation to any contour ellipse will be given by equating
the index of e to a constant, but it is very much easier to draw
the ellipses if we refer them to their principal axes. To do this
we must first determine 6, 2, and 2, From (9),

tan 20 = — 46-49,
whence 26=91° 14’, §=45" 37’, the principal axes standing very
nearly at an angle of 45° with the axes of measurement,
owing to the two standard-deviations being very nearly equal.
They should be set off on the diagram, not with a protractor, but
by taking tan 6 from the tables (1:022) and calculating points on
each axis on either side of the mean.

To obtain 2, and 2, we have from (10) and (11)

224+ 22=14961

22, 2,=12-868
Adding and subtracting these equations from each other and
taking the square root,

2, 4-2,=5275

2, — 2, =1-447
whence 2, =3-36, 2,=1'91; owing to the principal axes stand-
ing nearly at 45° the first value is sensibly the same as that found
for oz in § 10. The equations to the contour ellipses, referred to
the principal axes, may therefore be written in the form

Grae

sept [Tore

396
        <pb n="353" />
        XVL—NORMAL CORRELATION. 7
the major and minor axes being 3:36 x ¢ and 1-91 x ¢ respectively.
To find ¢ for any assigned value of the frequency y we have

’ A
Yig=1Y 1o¢
t= 2(log ¥'15 — log 715)
log e :

Supposing that we desire to draw the three contour-ellipses for

y=95, 10 and 20, we find ¢=183, 1-40 and 0°76, or the following
67
to
rj
65
66

}
Sf 67
= 68
= 69
» ee...
FP, 62 63 "oz NN WOU 6 63 69 70 TI 72 4% 7
Stature of Father : inches
Fic. 52.— Contour Lines for the Frequencies 5, 10 and 20 of the distribution
of Table I11., Chap. IX., and corresponding Contour Ellipses of the fitted
Normal Surface. P, P,, P, P,, principal axes: J, mean.

values for the major and minor axes of the ellipses :—semi-major

axes, 6°15, 4°70, 2°55: semi-minor axes, 3-50, 2:67, 1:45. The

ellipses drawn with these axes are shown in fig. 52, very much

392°
        <pb n="354" />
        &lt; THEORY OF STATISTICS.

reduced, of course, from the original drawing, one of the squares
shown representing a square inch on the original. The actual
contour lines for the same frequencies are shown by the irregular
polygons superposed on the ellipses, the points on these polygons
having been obtained by simple graphical interpolation between
the frequencies in each row and each column—diagonal interpola-
tion between the frequencies in a row and the frequencies in a
column not being used. It will be seen that the fit of the two
lower contours is, on the whole, fair, especially considering the
high standard errors. In the case of the central contour, y= 20,
the fit looks very poor to the eye, but if the ellipse be compared
carefully with the table, the figures suggest that here again we
have only to deal with the effects of fluctuations of sampling.
For father’s stature=66 in., son’s stature= 70 in., there is
a frequency of 18:75, and an increase in this much less than the
standard error would bring the actual contour outside the ellipse.
Again, for father’s stature=68 in., son’s stature="71 in., there
is a frequency of 19, and an increase of a single unit would give
a point on the actual contour below the ellipse. Taking the
results as a whole, the fit must be regarded as quite as good as
we could expect with such small frequencies. It is perhaps of
historical interest to note that Sir Francis Galton, working with-
out a knowledge of the theory of normal correlation, suggested
that the contour lines of a similar table for the inheritance of
stature seemed to be closely represented by a series of concentric
and similar ellipses (ref. 2): the suggestion was confirmed when
he handed the problem, in abstract terms, to a mathematician,
Mr J. D. Hamilton Dickson (ref. 4), asking him to investigate
“the Surface of Frequency of Error that would result from
these data, and the various shapes and other particulars of its
sections that were made by horizontal planes” (ref. 3, p. 102).

12. The normal distribution of frequency for two variables is
an isotropic distribution, to which all the theorems of Chap. V.
§§ 11-12 apply. For if we isolate the four compartments of the
correlation-table common to the rows and columns centring
round values of the variables w,, xy , xy we have for the ratio
of the cross-products (frequency of #, #, multiplied by frequency
of 2, 2), divided by frequency of », 2; multiplied by frequency of
x, 7p),

712 ’ ’
ya u)(# =)
Assuming that 2; — 2; has been taken of the same sign as x; — x,
the exponent is of the same sign as 7, Hence the association for

398
        <pb n="355" />
        XVIL.—NORMAL CORRELATION, ;
this group of four frequencies is also of the same sign as r,, the
ratio of the cross-products being unity, or the association zero,
if 71, is zero. Ina normal distribution, the association is therefore
of the same sign—the sign of rg—for every tetrad of frequencies
in the compartments common to two rows and two columns ; that
is to say, the distribution is isotropic. It follows that every
grouping of a normal distribution is isotropic whether the class-
intervals are equal or unequal, large or small, and the sign of the
association for a normal distribution grouped down to 2- x 2-fold
form must always be the same whatever the axes of division
chosen.

These theorems are of importance in the applications of the
theory of normal correlation to the treatment of qualitative
characters which are subjected to a manifold classification. The
contingency tables for such characters are sometimes regarded as
groupings of a normal distribution of frequency, and the coefficient
of correlation is determined on this hypothesis by a rather lengthy
procedure (ref. 14). Before applying this procedure it is well,
therefore, to see whether the distribution of frequency may be
regarded as approximately isotropic, or reducible to isotropic form
by some alteration in the order of rows and columns (Chap. V.
$3 9-10). If only reducible to isotropic form by some rearrange-
ment, this rearrangement should be effected before grouping the
table to 2-x 2-fold form for the calculation of the correlation
coefficient by the process referred to. If the table is not reducible
to isotropic form by any rearrangement, the process of calculating
the coefficient of correlation on the assumption of normality is to
be avoided. Clearly, even if the table be isotropic it need not be
normal, but at least the test for isotropy affords a rapid and
simple means for excluding certain distributions which are not
even remotely normal. Table II. of Chap. V. might possibly be
regarded as a grouping of normally distributed frequency if re-
arranged as suggested in § 10 of the same chapter—it would be
worth the investigator’s while to proceed further and compare
the actual distribution with a fitted normal distribution—but
Table IV. could not be regarded as normal, and could not be
rearranged so as to give a grouping of normally distributed
frequency.

13. If the frequencies in a contingency-table be not large, and
also if the contingency or correlation be small, the influence
of casual irregularities due to fluctuations of sampling may
render it difficult to say whether the distribution may be regarded
as essentially isotropic or no. In such cases some further con-
densation of the table by grouping together adjacent rows and
columns, or some process of “smoothing” by averaging the

326
        <pb n="356" />
        - THEORY OF STATISTICS.
frequencies in adjacent compartments, may be of service. The
correlation table for stature in father and son (Table III, Chap.
IX.), for instance, is obviously not strictly isotropic as it stands:
we have seen, however, that it appears to be normal, within the
limits of fluctuations of sampling, and it should consequently be
isotropic within such limits. We can apply a rough test by
regrouping the table in a much coarser form, say with four rows
and four columns: the table below exhibits such a grouping, the
limits of rows and of columns having been so fixed as to include
not less than 200 observations in each array.
TaBLE I.—(condensed from Table III. of Chapter IX.).
Father’s Stature (inches).
Son’s Stature
(inches). Under ,
! 5_G7T : . 69°5
655. G5-5-67 SIENG67-5-605.008 ., 7 over, I Total.
Under 66°5 97°5 74°25 34°75 105 217
66°5-685 76°5 108 85 52 3215
685-705 33°25 64°75 95 845 277°5
70°5 and over 14°75 325 8075 | 134 262
Total 222 2795 2955 281 1078
Taking the ratio of the frequency in col. 1to the sum of the
frequencies in cols. 1 and 2 for each successive row, and so on for
the other pairs of columns, we find the following series of ratios :
TABLE IL — Ratio of Frequency in Column m to Frequency in Column m
+ Frequency in Column (m+1) in Table 1.
Columns
Row.
1 and 2. 2 and 3. 3 and 4.
0568 0-681 0-768
0415 0560 0620
0 339 0405 0529
0-312 0-287 0376
These ratios decrease continuously as we pass from the top to the
bottom of the table, and the distribution, as condensed, is therefore

330
        <pb n="357" />
        XVL.—NORMAL CORRELATION, 221
isotropic. The student should form one or two other condensations
of the original table to 3- x 3- or 4- x 4-fold form : he will probably
find them either isotropic, or diverging so slightly from isotropy
that an alteration of the frequencies, well within the margin of
possible fluctuations of sampling, will render the distribution
isotropic.

14. Before concluding this chapter we may note briefly some
of the principal properties of the normal distribution of frequency
for any number of variables, referring the student for proofs to
the original memoirs. Denoting the frequency of the combination
of deviations x, @, «,, . . . , x, by Yi2 . ... n We must have
in the notation of Chapter XII. if the uncorrelated deviations zy,
Typ» ¥3.19: ete. be completely independent (cf. § 3 of the present
chapter),

Viz... n=Yis. .. ne CHE 2 ta) (12)
where
z. x = “a1 Pitiy nS Tai... ey

Ht... @) AT AL ore (13)
gad ¥12 eee? (27)"*6,04103 19 w ele Cet FRYE ae inl) (14)
The expression (13) for the exponent ¢ may be reduced to a
general form corresponding to that given for two variables, viz.—

Be vn ua... .n Hk rE (n—1) (15)

: Lp —1%y,

2Ti2s...m Cis... A028. ns Lr nti... 2 Tn-1)1... n-9nOnl... (a=1
Several important results may be deduced directly from the form
(13) for the exponent. Clearly this might have been written in
a great variety of ways, commencing with any deviation of the
first order, allotting any primary subscript to the second deviation
(except the subscript of the first), and so on, just-as in § 3 we
arrived at precisely the same final form for the exponent whether
we started with the two deviations xy and x, or with x, and xy oe
Our assumption, then, that the deviations xy, Ty, ¥g49 etc. are
normally distributed amounts to the assumption that all devia-
tions of any order and with any suffixes are normally distributed,
i.e. in the general normal distribution Jor n variables every array
of every order is a normal distribution. It will also follow, gen-
eralising the deduction of § 6, that any linear function of x, X,
+ + + . %, is normally distributed. Further, if in (13) any fixed

af
tC
        <pb n="358" />
        wk THEORY OF STATISTICS.

values be assigned to x,,, and all the following deviations, the

correlation between x, and x, on expanding x, is, as we have

seen, normal correlation. Similarly, if any fixed values be
assigned to aj, to #1, and all the following deviations, on
reducing x, ,, to the second order we shall find that the correla-
tion between x,; and x, is normal correlation, the correlation
coefficient being r,,,, and so on. That is to say, using % to
denote any group of secondary suffixes, (1) the correlation between
any two deviations x, and x, ts normal correlation ; (2) the correla-
tion between the said deviations 1S Tp, whatever the particular
fixed values assigned to the remaining deviations. The latter
conclusion, it will be seen, renders the meaning of partial
correlation coefficients much more definite in the case of normal

correlation than in the general case. In the general case 7.

represents merely the average correlation, so to speak, between

Zn; and z,,: in the normal case 7,,; 1s constant for all the sub-

groups corresponding to particular assigned values of the other

variables. Thus in the case of three variables which are normally
correlated, if we assign any given value to x; the correlation
between the associated values of #; and w, is ry, : in the general
case rq if actually worked out for the various sub-groups
corresponding, say, to increasing values of x; would probably
exhibit some continuous change, increasing or decreasing as the
case might be. Finally, we have to note that if, in the expression

(15) for ¢, we assign fixed values, say A, ks ete., to all the

deviations except a;, and then throw ¢ into the form of a perfect

square (as in § 4 for the case of two variables), we obtain a normal
distribution for #; in which the mean is displaced by
a hi Gy.95. in F193...

T1234... — ht Piosh.. ng. rn ce Ting... lg TT fen,

But this is a linear function of A, As, etc., therefore in the case of

normal correlation the regression of any one variable on any or all

of the others ts strictly linear. The expressions Tig ....n»

Gros. ...nf023....m etc. are of course the partial regressions

bios... . mw ©LC.

REFERENCES.
General.

(1) Bravars, A., “Analyse mathématique sur les probabilités des erreurs de
situation d’un point,” Acad. des Sciences : Mémoires presentés par divers
sawants, 11° série, ix., 1846, p. 255.

(2) GavrroN, Francis, ¢“ Family Likeness in Stature,” Proc. Roy. Soc., vol. xl.

. 42.
(3) ha natices, Natural Inheritance ; Macmillan &amp; Co., 1889.

296
        <pb n="359" />
        XVI.—NORMAL CORRELATION.

(4) Dickson, J. D. HamirroN, Appendix to (2), Proc. Roy. Soc., vol. xi,
1886, p. 63.

(5) Blovnare, F. Y., “On Correlated Averages,” Phil. Mag., 5th Series,
vol. xxxiv., 1892, p. 190.

(8) PEARsoN, KARL, ‘Regression, Heredity, and Panmixia,” Phil. Trans.
Roy. Soc., Series A, vol. clxxxvii., 1896, p. 253.

(7) Pearson, KARL, “On Lines and Planes of Closest Fit to Systems of Points
in Space,” Phil. Mag., 6th Series, vol. ii., 1901, p. 559. (On the fitting
of “ principal axes” and the corresponding planes in the case of more
than two variables.)

(8) Pearson, KARL, “On the Influence of Natural Selection on the Variability
and Correlation of Organs,” Phil. Trans. Roy. Soc., Series A, vol, ce.
1902, p. 1. (Based on the assumption of normal correlation.)

(9) PEARSON, KARL, and ALICE LEE, ‘On the Generalised Probable Error in
Multiple Normal Correlation,” Biometrika, vol. vi., 1908, DP. 59.

(10) YuLg, G. U., “On the Theory of Correlation,” Jour. Roy. Stat. Soc.,
vol. Ix., 1897, p. 812.

(11) Yuig, G. U., “On the Theory of Correlation for any number of Variables
treated by a New System of Notation,” Proc. Roy. Soc., Series A, vol.
Ixxix., 1907, p. 182.

(12) SHEPPARD, W. F., “On the Application of the Theory of Error to Cases
of Normal Distribution and Normal Correlation,” Phil. Trans. Roy.
Soc., Series A, vol. excii., 1898, p. 101.

(13) SueppArD, W, F., “On the Galodation of the Double-integral express-
ing Normal Correlation,” Cambridge Phil. Trans., vol. xix., 1900, p. 23.

Applications to the Theory of Attributes, etc.

(14) PEARrsoN, KARL, ‘“On the Correlation of Characters not Quantitatively
Measurable,” Phil. Trans. Roy. Soc., Series A, vol. exev., 1900, p. 1.
(Cf. criticism in ref. 3 of Chap. III.)

(15) PEARSON, KARL, ‘“ On a New Method of Determining Correlation between
a Measured Character 4 and a Character B, of which only the Percent-
age of Cases wherein B exceeds (or falls short of) a given Intensity is
recorded for each grade of 4,” Biometrika, vol. vii., 1909, p. 96.

(16) PEARSON, KARL, ‘‘On a New Method of Determining Correlation, when
one Variable is given by Alternative and the other by Multiple
Categories,” Biometrika, vol. vii., 1910, p. 248.

See also the memoir (12) by Sheppard.
Various Methods and their Relation to Normal Correlation.

(17) Pearson, KARL, ‘‘ On the Theory of Contingency and its Relation to
Association and Normal Correlation,” Drapers’ Company Research
Memoirs, Biometric Series I. ; Dulau &amp; Co., London, 1904.

(18) PEARrsoN, KARL, “On Further Methods of Determining Correlation,”
Drapers’ Company Research Memoirs, Biometric Series IV. (Methods
based on correlation of ranks: difference methods.) Dulau &amp; Co.,
London, 1907.

(19) SpeARrMAN, C., “A Footrule for Measuring Correlation,” Brit. Jour. of
Psychology, vol. ii., 1906, p. 89. (The suggestion of a *‘ rank ” method :
see Pearson’s criticism and improved formula in (18) and Spearman’s
reply on some points in (20).) )

(20) SPEARMAN, C., ‘Correlation calculated from Faulty Data,” Brit. Jour.
of Psychology, vol. iii., 1910, p. 271.

(21) THORNDIKE, E. L., *“ Empirical Studies in the Theory of Measurement,”
Archives of Psychology (New York), 1907.

333
        <pb n="360" />
        THEORY OF STATISTICS.
EXERCISES.

1. Deduce equation (11) from the equations for transformation of co-ordinates
without assuming the normal distribution. (A proof will be found in ref. 10.)

2. Hence show that if the pairs of observed values of ; and x, are repre-
sented by points on a plane, and a straight line drawn through the mean, the
sum of the squares of the distances of the points from.this line is a minimum
if the line is the major principal axis.

3. The coefficient of correlation with reference to the principal axes being
zero, and with reference to other axes something, there must be some pair of
axes at right angles for which the correlation is a maximum, ¢.e. is numerically
greatest without regard to sign. Show that these axes make an angle of 45°
with the principal axes, and that the maximum value of the correlation is—

L3H
~ 343

4. (Sheppard, ref. 12.) A fourfold table is formed from a normal correla-
tion table, taking the points of division between 4 and a, B and B, at the
medians, so that (4)=(a)=(B)=(B)=N/2. Show that

DJ
T= COS (1 yi

234
        <pb n="361" />
        CHAPTER XVIL
THE SIMPLER CASES OF SAMPLING FOR VARIABLES:
PERCENTILES AND MEAN.

1-2. The problem of sampling for variables; the conditions assumed—
3. Standard error of a percentile—4. Special values for the percentiles
of a normal distribution—5. Effect of the form of the distribution
generally—6. Simplified formula for the case of a grouped frequency-
distribution—7. Correlation between errors in two percentiles of the
same distribution—8. Standard error of the interquartile range for the
normal curve—9. Effect of removing the restrictions of simple sampling,
and limitations of interpretation —10. Standard error of the arithmetic
mean—11. Relative stability of mean and median in sampling—12.
Standard error of the difference between two means—13. The tendency
to normality of a distribution of means—14. Effect of removing the re-
strictions of simple sampling—15. Statement of the standard errors of
standard-deviation, coefficient of variation, correlation coefficient and
regression, correlation-ratio and criterion for linearity of regression—186.
Restatement of the limitations of interpretation if the sample be small.

1. Iv Chapters XIIL.-XVI. we have been concerned solely with

the theory of sampling for the case of attributes and the frequency-

distributions appropriate to that case. We now proceed to
consider some of the simpler theorems for the case of variables

(¢f. Chap. XIII § 2). Suppose that we have a bag containing a

practically infinite number of tickets or cards bearing the recorded

values of some variable X, and that we draw a ticket from this
bag, note the value that it bears, draw another, and so on until
we have drawn n cards (a number small compared with the whole
number in the bag). Let us continue this process until we have

&amp; such samples of n cards each, and then work out the mean,

standard-deviation, median, etc., for each of the samples. No one

of these measures will prove to be absolutely the same for every
sample, and our problem is to determine the standard-deviation
that each such measure will exhibit.

2. In solving this problem, we must be careful to define
precisely the conditions which are assumed to subsist, so as to
realise the limitations of any solution obtained. These conditions
33H
        <pb n="362" />
        THEORY OF STATISTICS.
were discussed very fully for the case of attributes (Chap. XIII
§ 8), and we would refer the student to the discussion then given.
ere it is sufficient to state the assumptions briefly, using the
etters (a), (6) and (c) to denote the corresponding assumption
indicated by the same letters in the section cited.

(a) We assume that we are drawing from precisely the same
record throughout the experiment, so that the chance of drawin
a card with any given value of X, or a value within any assigned
limits, is the same at each sampling.

(6) We assume not only that we are drawing from the same
record throughout, but that each of our cards at each drawin

ay be regarded quite strictly as drawn from the same record (or
rom identically similar records): e.g. if our card-record is con-
ained in a series of bundles, we must not make it a practice to
ake the first card from bundle number 1, the second card from
undle number 2, and so on, or else the chance of drawing
card with a given value of X, or a value within assigned limits
may not be the same for each individual card at each drawing.

(c) We assume that the drawing of each card is entirely
independent of that of every other, so that the value of X recorde
on card 1, at each drawing, is uncorrelated with the value of
recorded on card 2, 3, 4, and so on. It is for this reason that w
spoke of the record, in § 1, as containing a practically infinit

umber of cards, for otherwise the successive drawings at each
sampling would not be independent: if the bag contain te
ickets only, bearing the numbers 1 to 10, and we draw the car
bearing 1, the average of the following cards drawn will be higher
han the mean of all cards drawn ; if, on the other hand, we dra
he 10, the average of the following cards will be lower than the mea
f all cards—.e. there will be a negative correlation between th
umber on the card taken at any one drawing and the card taken
at any other drawing. Without making the number of cards i
he bag indefinitely large, we can, as already pointed out for th
ase of attributes (Chap. XIII. § 3), eliminate this correlation b
replacing each card before drawing the next.

Sampling conducted under these conditions we shall, as before
speak of as simple sampling. We do not, it should be noticed
make the further assumption that the sample is unbiassed, 7.e.
hat the chance of inclusion in the sample is independent of the
value of X recorded on the card (cf. the last paragraph in § 8,

hap. XIII, and the discussion in §§ 4-8, Chap. XIV.). This
assumption is unnecessary. If it be true, the interpretation o
our results becomes simpler and more straightforward, for we
can substitute for such phrases as ‘the standard-deviation of
'n a very large sample,” “the form of the frequency-distributio

236
        <pb n="363" />
        XVIL.—SIMPLER CASES OF SAMPLING FOR VARIABLES. 337
in a very large sample,” the phrases “the standard-deviation of
X in the original record,” “the form of the frequency-distribution
in the original record”: but in very many, perhaps the majority
of, practical cases the very question at issue is the nature of the
relation between the distribution of the sample and the distribu-
tion of the record from which it is drawn. As has already been
emphasised in the passages to which reference is made above, no
examination of samples drawn under the same conditions can
give any evidence on this head.

3. Standard Error of a Percentile.—Let us consider first the
fluctuations of sampling for a given percentile, as the problem is
intimately related to that of Chaps. XIII.-XIV,

Let X, be a value of X such that pN of the values of X in
an indefinitely large sample drawn under the same conditions lie
above it and ¢V below it.

If we note the proportions of observations above X, in samples
of » drawn from the record, we know that these observed values
will tend to centre round p as mean, with a standard-deviation
Vpg/n. If now at each drawing, as well as observing the pro-
portion of X's above X,, say p +9, for the sample, we also proceed
to note the adjustment e required in X, to make the proportion
of observations above X,+e¢ in the sample p, the standard-
deviation of e€ will bear to the standard-deviation of 8 the same
ratio that e on an average bears to 4. But this ratio is quite
simply determinable if the number of observations in the sample
is sufficiently large to justify us in assuming that § is small—so
small that we may regard the element of the frequency curve
(for a very large sample) over which X, + e ranges as approximately
a rectangle. If this assumption be made, and we denote the
standard-deviation of X in a very large sample by o, and the
ordinate of the frequency curve at X, when drawn with unit area
and unit standard-deviation by z,,

e=".3
2,
Therefore for the standard-deviation of e or of the percentile
corresponding to a proportion p we have
A
Iz, = &gt; n a (1)

4. If the frequency-distribution for the very large sample be a
normal curve, the values of y, for the principal percentiles may be
taken from the published tables. A table calculated by Mr
Sheppard (Table IIL, p. 9, in Zables for Statisticians and Biomet-

Ory
ah
        <pb n="364" />
        3323 THEORY OF STATISTICS.
rucians, or Table IV., ref. 16, in Appendix I.) gives the values
directly, and these have been utilised for tle following : the
student can estimate the values roughly by a combined use of the
area and ordinate tables for the normal curve given in Chapter
XV., remembering to divide the ordinates given in that table by
J/27 50 as to make the area unity— Value of 3,
Median . . 0-3989423
Deciles 4 and 6 . 03863425
9. oand 7 : v ; 0:3476926
si and 8 : : 0-2799619
sit sliand 9 . . 01754983
Quartiles ; 03177766
Inserting these values of y, in equation (1), we have the
following values for the standard errors of the median, deciles,
etc., and the values given in the second column for their probable
errors (Chap. XV. § 17), which the student may sometimes find
useful :—
Standard error is Probable error is
o/Nn multiplied by o/Nn multiplied by
Median : . 125331 084535
Deciles 4 and 6 . 1-26804 0-85528
ni 9 and THE . 1:31800 0-88897
yw £2 and 3H 1-42877 0:96369
yw Sl and 1-70942 1-15298
Quartiles . 1-36263 0-91908

It will be seen that the influence of fluctuations of sampling on
the several percentiles increases as we depart from the median:
the standard error of the quartiles is nearly one-tenth greater than
that of the median, and the standard error of the first or ninth
deciles more than one-third greater.

5. Consider further the influence of the form of the frequency-
distribution on the standard error of the median, as this is an
important form of average. For a distribution with a given
number of observations and a given standard-deviation the
standard error varies inversely as y,. Hence for a distribution in
which y, is small, for example a U-shaped distribution like that
of fig. 18 or fig. 19, the standard error of the median will be
relatively high, and it will, in so far, be an undesirable form of
average to employ. On the other hand, in the case of a distribu-
tion which has a high peak in the centre, so as to exhibit a value
of y, large compared with the standard-deviation, the standard
error of the median will be relatively low. We can create such a

. LN
        <pb n="365" />
        XVIL.—SIMPLER CASES OF SAMPLING FOR VARIABLES. 339
“peaked” distribution by superposing a normal curve with a
small standard-deviation on a normal curve with the same mean
and a relatively large standard-deviation. To give some idea of
the reduction in the standard error of the median that may be
effected by a moderate change in the form of the distribution, let
us find for what ratio of the standard-deviations of two such curves,
having the same area, the standard error of the median reduces to
o//n, where o is of course the standard-deviation of the com-
pound distribution.

Let oy, 0, be the standard-deviations of the two distributions,
and let there be n/2 observations in each. Then
of +o}
g= v “5 (@)
On the other hand, the value of Y, 18—
IRE Mgr 1 Wo
22x. 0 22.0, 2
Hence the standard error of the median is
/ 2r S199 [AY
Von oy, + oy
(¢) is equal to o/In if
(01+ 03) Voitai_,
2 roe, :
Writing oy/o =p, that is if
(Lp) JT+p2_,
2 Amp
P +203 + (2 - 4m)p2 + 2p +1 =0.
This equation may be reduced to a quadratic and solved by
1
taking p + 28 &amp; new variable. The roots found give p=2-2360
+v..0r 04472... the one root being merely the reciprocal of
the other. The standard error of the median will therefore be
/y/n, in such a compound distribution, if the standard-deviation
of the one normal curve is, in round numbers, about 2} times
that of the other. If the ratio be greater, the standard error
of the median will be less than a/n/n. The distribution

or
(e,
\S,
        <pb n="366" />
        24 THEORY OF STATISTICS.

for which the standard error of the median is exactly equal to
a/a/n is shown in fig. 53: it will be seen that it is by no means
a very striking form of distribution; at a hasty glance it might
almost be taken as normal. In the case of distributions of a form
more or less similar to that shown, it is evident that we cannot
at all safely estimate by eye alone the relative standard error of
the median as compared with o/s/7.

6. In the case of a grouped frequency-distribution, if the
number of observations is sufficient to make the class-frequencies
run fairly smoothly, ¢.e. to enable us to regard the distribution

Fic. 53.
as nearly that of a very large sample, the standard error of any
percentile can be calculated very readily indeed, for we can
eliminate o from equation (1). Let f, be the frequency-per-
class-interval at the given percentile—simple interpolation will
give us the value with quite sufficient accuracy for practical
purposes, and if the figures run irregularly they may be smoothed.
Let o be the value of the standard-deviation expressed in class-
intervals, and let # be the number of observations as before.
Then since 7, is the ordinate of the frequency-distribution when
drawn with unit standard-deviation and unit area, we must
have

ag
Yo=—Jp

Aa,
        <pb n="367" />
        XVIL—SIMPLER CASES OF SAMPLING FOR VARIABLES. 341
But this gives at once for the standard error expressed in terms
of the class-interval as unit

n,

ym 22 (2)
As an example in which we can compare the results given by
the two different formule (1) and (2), take the distribution of
stature used as an illustration in Chaps. VII. and VIIL and in
$$ 13, 14 of Chap. XV. The number of observations is 8585,
and the standard-deviation 2:57 in., the distribution being
approximately normal : o/,/n=0027737, and, multiplying by the
factor 1-253 . . . . given in the table in § 4, this gives 00348
as the standard error of the median, on the assumption of
normality of the distribution. Using the direct method of
equation (2), we find the median to be 67:47 (Chap. VII. § 15),
which is very nearly at the centre of the interval with a
frequency 1329. Taking this as being, with sufficient accuracy
for our present purpose, the frequency per interval at the median,
the standard error is

J8585
1399 =00349.

As we should expect, the value is practically the same as that
obtained from the value of the standard-deviation on the assump-
tion of normality.

Let us find the standard error of the first and ninth deciles
as another illustration. On the assumption that the distribu-
tion is normal, these standard errors are the same, and equal to
0:027737 x 1'70942=00474. Using the direct method, we
find by simple interpolation the approximate frequencies per
interval at the first and ninth deciles respectively to be 590 and
570, giving standard errors of 00471 and 00488, mean 0-0479,
slightly in excess of that found on the assumption that the fre-
quency is given by the normal curve. The student should notice
that the class-interval is, in this case, identical with the unit of
measurement, and consequently the answer given by equation (2)
does not require to be multiplied by the magnitude of the
interval.

In the case of the distribution of panperism (Chap. VIL,
Example i.), the fact that the class-interval is not a unit must
be remembered. The frequency at the median (3-195 per cent.)
is approximately 96, and this gives for the standard error of the
median by (2) (the number of observations being 632) 0:1309
intervals, that is 0:0655 per cent.

7. In finding the standard error of the difference between two
        <pb n="368" />
        342 THEORY OF STATISTICS.
percentiles in the same distribution, the student must be care-
ful to note that the errors in two such percentiles are not
independent. Consider the two percentiles, for which the values
of p and ¢ are p, q,, p, 9, respectively, the first-named being the
lower of the two percentiles. These two percentiles divide the
whole area of the frequency curve inte three parts, the areas of
which are proportional to ¢;, 1 — ¢; —p,, and p,. Further, since
the errors in the first percentile are directly proportional to the
crrors in ¢,, and the errors in the second percentile are directly
proportional but of opposite sign to the errors in p,, the corre-
lation between errors in the two percentiles will be the same as
the correlation between errors in ¢; and p, but of opposite sign.
But if there be a deficiency of observations below the lower
percentile, producing an error §, in ¢;, the missing observations
will tend to be spread over the two other sections of the curve
in proportion to their respective areas, and will therefore tend to
produce an error
3,= 2%, 5,
in p,. If then » be the correlation between errors in ¢, and p,,
¢ and e, their respective standard errors, we have
r2= _P2
&amp; py
Or, inserting the values of the standard errors,
8 al
991

The correlation between the percentiles is the same in magni-

tude but opposite in sign : it is obviously positive, and consequently
correlation between errors | _ z NL Poth

; } - 3)
in two percentiles Gol

If the two percentiles approach very close together, ¢, and g,,
p; and p, become sensibly equal to one another, and the correla-
tion becomes unity, as we should expect.

8. Let us apply the above value of the correlation between
percentiles to find the standard error of the semi-interquartile
range for the normal curve. Inserting ¢;=p,=%, ¢,=p,=4%, we
find r=1. Hence the standard error of the interquartile range
is, applying the ordinary formula for the standard-deviation of a
difference, 2/,/3 times the standard error of either quartile, or

“wy
        <pb n="369" />
        XVIL.—SIMPLER CASES OF SAMPLING FOR VARIABLES. 343
the standard error of the semi-interquartile range 1/,/3 times
the standard error of a quartile. Taking the value of the
standard error of a quartile from the table in § 4, we have, finally,

standard error of the semi- | 7
interquartile range in a - =0-78672——=, (4)
normal distribution J X#

Of course the standard-deviation of the inter-quartile, or semi-
interquartile, range can readily be worked out in any particular
case, using equation (2) and the value of the correlation
given above: it is best to work out such standard errors
from first principles, applying the usual formula for the standard
deviation of the difference of two correlated variables (Chap. XI.
§ 2, equation (1)).

9. If there is any failure of the conditions of simple sampling,
the formule of the preceding sections cease, of course, to hold
good. We need not, however, enter again into a discussion of
the effect of removing the several restrictions, for the effect on
the standard error of p was considered in detail in § 9-14 of
Chap. XIV., and the standard error of any percentile is directly
proportional to the standard error of p (¢f. § 3). Further, the
student may be reminded that the standard error of any per-
centile measures solely the fluctuations that may be expected in
that percentile owing to the errors of simple sampling alone: it
has no bearing, therefore, save on the one question, whether an
observed divergence of the percentile, from a certain value that
might be expected to be yielded by a more extended series of
observations or that had actually been observed in some other
series, might or might not be due to fluctuations of simple
sampling alone. It cannot and does not give any indication of
the possibility of the sample being biassed or unrepresentative of
the material from which it has been drawn, nor can it give any
indication of the magnitude or influence of definite errors of
observation—errors which may conceivably be of greater im-
portance than errors of sampling. In the case of the distribution
of statures, for instance, the standard error almost certainly gives
quite a misleading idea as to the accuracy attained in determining
the average stature for the United Kingdom : the sample is not

representative, the several parts of the kingdom not contributing
in their true proportions. The student should refer again to the
discussion of these points in §§ 4-8 of Chap. XIV. Finally, we
may note that the standard error of a percentile cannot be
evaluated unless the number of observations is fairly large—large
enough to determine f, (eqn. 2) with reasonable accuracy, or
        <pb n="370" />
        344 THEORY OF STATISTICS.
to test whether we may treat the distribution as approximately
normal (cf. also § 16 below).

(As regards the theory of sampling for the median and per-
centiles generally, cf. ref. 15, Laplace, Supplement II. (standard
error of the median), Edgeworth, refs. 5, 6, 7, and Sheppard, ref.
27: the preceding sections have been based on the work of
Edgeworth and Sheppard.)

10. Standard Error of the Arithmetic Mean.—Let us now pass
to a fresh problem, and determine the standard error of the
arithmetic mean.

This is very readily obtained. Suppose we note separately at
each drawing the value recorded on the first, second, third . . . .
and nth card of our sample. The standard-deviation of the values
on each separate card will tend in the long run to be the same,
and identical with the standard-deviation o of « in an indefinitely
large sample, drawn under the same conditions. Further, the
value recorded on each card is (as we assume) uncorrelated with
that on every other. The standard-deviation of the sum of the
values recorded on the nm cards is therefore a/n.oc, and the
standard-deviation of the mean of the sample is consequently
1/nth of this; or,

o
On ="In . (5)

This is a most important and frequently cited formula, and the
student should note that it has been obtained without any
reference to the size of the sample or to the form of the frequency-
distribution. It is therefore of perfectly general application, if
oc be known. We can verify it against our formula for the
standard-deviation of sampling in the case of attributes. The
standard-deviation of the number of successes in a sample of m
observations is a/m.pg: the standard-deviation of the total
number of successes in n samples of m observations each is there-
fore a/mm.pq: dividing by n we have the standard-deviation of
the mean number of successes in the = samples, viz. mpg [n/n
agreeing with equation (5).

11. For a normal curve the standard error of the mean is to
the standard error of the median approximately as 100 to 125
(¢f. § 4), and in general the standard errors of the two stand in
a somewhat similar ratio for a distribution not differing largely
from the normal form. For the distribution of statures used as
an illustration in § 6 the standard error of the median was found
to be 0:0349: the standard error of the mean is only 0-0277.
The distribution being very approximately normal, the ratio of
        <pb n="371" />
        XVIL—SIMPLER CASES OF SAMPLING FOR VARIABLES. 345
the two standard errors, viz. 126, assumes almost exactly the theo-
retical magnitude. In the case of the asymmetrical distribution of
rates of pauperism, also used as an illustration in § 6, the standard
error of the median was found to be 00655 per cent. The
standard error of the mean is only 0:0493 per cent., which bears
to the standard error of the median a ratio of 1 to 1°33. As
such cases as these seem on the whole to be the more common
and typical, we stated in Chap. VII. § 18 that the mean is in
general less affected than the median by errors of sampling. At
the same time we also indicated the exceptional cases in which
the median might be the more stable—cases in which the mean
might, for example, be affected considerably by small groups of
widely outlying observations, or in which the frequency-distri-
bution assumed a form resembling fig. 53, but even more
exaggerated as regards the height of the central “peak ” and the
relative length of the “tails.” Such distributions are not un-
common in some economic statistics, and they might be expected
to characterise some forms of experimental error. If, in these
cases, the greater stability of the median is sufficiently marked
to outweigh its disadvantages in other respects, the median
may be the better form of average to use. Fig. 53 represents
a distribution in which the standard errors of the mean and of the
median are the same. Further, in some experimental cases it is
conceivable that the median may be less affected by definite
experimental errors, the average of which does not tend to be
zero, than is the mean, —this is, of course, a point quite distinct
from that of errors of sampling.

12. If two quite independent samples of n, and n, observations
respectively be drawn from a record, evidently €,5 the standard
error of the difference of their means is given by

Lio)

€la = s + 2) (5)
If an observed difference exceed three times the value of €19
given by this formula it can hardly be ascribed to fluctuations
of sampling. If, in a practical case, the value of o is not known
a priory, we must substitute an observed value, and it would seem
natural to take as this value the standard-deviation in the two
samples thrown together. If, however, the standard-deviations
of the two samples themselves differ more than can be accounted
for on the basis of fluctuations of sampling alone (see below, § 15),
we evidently cannot assume that both samples have been drawn
from the same record: the one sample must have been drawn
from a record or a universe exhibiting a greater standard-deviation

~
        <pb n="372" />
        346 THEORY OF STATISTICS.
than the other. If two samples be drawn quite independently
from different universes, indefinitely large samples from which
exhibit the standard-deviations o;, and o,, the standard error of
the difference of their means will be given by
oi 0%

SR ot 1D
This is, indeed, the formula usually employed for testing the
significance of the difference between two means in any case:
seeing that the standard error of the mean depends on the
standard-deviation only, and not on the mean, of the distribution,
we can inquire whether the two universes from which samples
have been drawn differ in mean apart from any dyfference in
dispersion.

If two quite independent samples be drawn from the same
universe, but instead of comparing the mean of the one with the
mean of the other we compare the mean m, of the first with the
mean m, of both samples together, the use of (6) or (7) is not
justified, for errors in the mean of the one sample are correlated
with errors in the mean of the two together. = Following precisely
the lines of the similar problem in § 13, Chap. XIII, case IIL, we
find that this correlation is Nn J(n, + ny), and hence

ny
0 =10; (my + 7g) h : \ . (8)
(For a complete treatment of this problem in the case of samples
drawn from two different universes ¢f. ref. 22.)

13. The distribution of means of samples drawn under the
conditions of simple sampling will always be more symmetrical
than the distribution of the original record, and the symmetry
will be the greater the greater the number of observations in the
sample. Further, the distribution of means (and therefore also of
the differences between means) tends to become not merely sym-
metrical but normal. We can only illustrate, not prove, the
point here ; but if the student will refer to§ 13, Chap. XV., he will
see that the genesis of the normal curve in this case is in accord-
ance with what we then stated, viz. that the distribution tends to
be normal whenever the variable may be regarded as the sum
(or some slightly more complex function) of a number of other
variables. In the present instance this condition is strictly ful
filled. The mean of the sample of n observations is the sum of
the values in the sample each divided by n, and we should expect
the distribution to be the more nearly normal the larger n. As
an illustration of the approach to symmetry even for small values
        <pb n="373" />
        XVIL—SIMPLER CASES OF SAMPLING FOR VARIABLES. 347
of n, we may take the following case. If the student will turn to
the calculated binomials, given as illustrations of the forms of
binomial distributions in Chap. XV. § 3, he will find there the
distribution of the number of successes for twenty events when
¢=09, p=0-1: the distribution is extremely skew, starting at
zero, rising to high frequencies for 1 and 2 successes, and thence
tailing off to 20 cases of 7 successes in 10,000 throws, 4 cases of 8
successes and 1 case of 9 successes. But now find the distribu-
tion for the mean number of successes in groups of five throws,
under the same conditions. This will be equivalent to finding
the distribution of the number of successes for 100 such events,
and then dividing the observed number of successes by five—the
last process making no difference to the form of the distribution,
but only to its scale. But the distribution of the number of
successes for 100 events when ¢=09, p=0-1, is also given in
Chap. XV. § 3, and it will be seen that, while it is appreciably
asymmetrical, the divergence from symmetry is comparatively
small : the distribution has gained very greatly in symmetry
though only five observations have been taken to the sample.
We may therefore reasonably assume, if our sample is large,
that the distribution of means is approximately a normal dis-
tribution, and we may calculate, on that assumption, the fre-
quency with which any given deviation from a theoretical value
or a value observed in some other series, in an observed mean, will
arise from fluctuations of simple sampling alone.

The warning is necessary, however, that the approach to
normality is only rapid if the condition that the several drawings
for each sample shall be independent is strictly fulfilled. 1f the
observations are not independent, but are to some extent positively
correlated with each other, even a fairly large sample may con-
tinue to reflect any asymmetry existing in the original distribution
{¢f. ref. 32 and the record of sampling there cited).

If the original distribution be normal, the distribution of
means, even of smali samples, is strictly normal. This follows at
once from the fact that any linear function of normally distributed
variables is itself normally distributed (Chap. XVI. § 6). The
distribution will not in general, however, be normal if the
deviation of the mean of each sample is expressed in terms of the
standard-deviation of that sample (cf. ref. 30).

14. Let us consider briefly the effect on the standard error of
the mean if the conditions of simple sampling as laid down in
§ 2 cease to apply.

(a) If we do not draw from the same record all the time, but
first draw a series of samples from one record, then another
series from another record with a somewhat different mean and
        <pb n="374" />
        345 THEORY OF STATISTICS.
standard-deviation, and so on, or if we draw the successive
samples from essentially different parts of the same record, the
standard error will be greatly increased. For suppose we draw
k, samples from the first record, for which the standard-deviation
(in an indefinitely large sample) is oy, and the mean differs by
d, from the mean of all the records together (as ascertained by
large samples in numbers proportionate to those now taken); &amp;,
samples from the second record, for which the standard-deviation
is o,, and the mean differs by d, from the mean of all the records
together, and so on. Then for the samples drawn from the first
record the standard error of the mean will be o/,/n, but the
distribution will centre round a value differing by d; from the
mean for all the records together: and so on for the samples
drawn from the other records. Hence, if 0, be the standard error
of the mean, XV the total number of samples,
Ji
Nat, = (iD) + 3k).
But the standard-deviation o, for all the records together is given
by
N.o2 = 2(ka®) + Z(kd?).
Hence, writing 2(kd?) = N.s;,,
oh mir iol (9)
n n
This equation corresponds precisely to equation (2) of § 9, Chap.
XIV. The standard error of the mean, if our samples are drawn
from different records or from essentially different parts of the
entire record, may be increased indefinitely as compared with the
value it would have in the case of simple sampling. If, for
example, we take the statures of samples of # men in a number
of different districts of England, and the standard-deviation of all
the statures observed is o,, the standard-deviation of the means
for the different districts will not be a,/x/n, but will have some
greater value, dependent on the real variation in mean stature
from district to district.

(b) If we are drawing from the same record throughout, but
always draw the first card from one part of that record, the
second card from another part, and so on, and these parts differ
more or less, the standard error of the mean will be decreased.
For if, in large samples drawn from the subsidiary parts of the
record from which the several cards are taken, the standard-
deviations are oy, 0, . . . . On and the means differ by d;, dos

LK
        <pb n="375" />
        XVIL.—SIMPLER CASES OF SAMPLING FOR VARIABLES. 349
«+ . . d, from the mean for a large sample from the entire record,
we have
3" dogy igi d
ga = R20 ) + JAP).
Henra
1
or m= (0 7
To _ Sm
= e  {10)
The last equation again corresponds precisely with that given for
the same departure from the rules of simple sampling in the case
of attributes (Chap. XIV. § 11., eqn. 4). If, to vary our previous
illustration, we had measured the statures of men in each of »
different districts, and then proceeded to form a set of samples
by taking one man from each district for the first sample, one
man from each district for the second sample, and so on, the
standard-deviation of the means of the samples so formed would
be appreciably less than the standard error of simple sampling
ao/s/n. Asa limiting case, it is evident that if the men in each
district were all of precisely the same stature, the means of all the
samples so compounded would be identical : in such a case, in fact,
oy =8,, and consequently o,,=0. To give another illustration, if
the cards from which we were drawing samples had been arranged
in order of the magnitude of X recorded on each, we would get
a much more stable sample by drawing one card from each
successive nth part of the record than by taking the sample
according to our previous rules—e.g. shaking them up in a bag
and taking out cards blindfold, or using some equivalent process.
The result is perhaps of some practical interest. It shows that,
if we are actually taking samples from a large area, different
districts of which exhibit markedly different means for the
variable under consideration, and are limited to a sample of =n
observations ; if we break up the whole area into n sub-districts,
each as homogeneous as possible, and take a contribution to the
sample from each, we will obtain a more stable mean by this
orderly procedure than will be given, for the same number of
observations, by any process of selecting the districts from which
samples shall be taken by chance. There may, however, be a
greater risk of biassed error. The conclusions seem in accord
with common-sense.
(c) Finally, suppose that, while our conditions (a) and (3) of § 2
hold good, the magnitude of the variable recorded on one card
drawn is no longer independent of the magnitude recorded on

RAG N
        <pb n="376" />
        2x) THEORY OF STATISTICS.
another card, e.g. that if the first card drawn at any sampling
bears a high value, the next and following cards of the same
sample are likely to bear high values also. Under these circum-
stances, if 7, denote the correlation between the values on the
first and second cards, and so on,
ES
5 == +25(rp +r + gC pe El Ba 0

There are n(n —1)/2 correlations; and if, therefore, r is the
arithmetic mean of them all, we may write

: Or

on=—{1+7r(n-1)] . . (LLY
As the means and standard-deviations of #;, x, . . . . #, are all
identical, » may more simply be regarded as the correlation
coefficient for a table formed by taking all possible pairs of the
n values in every sample. If this correlation be positive, the
standard error of the mean will be increased, and for a given
value of » the increase will be the greater, the greater the size of
the samples. If » be negative, on the other hand, the standard
error will be diminished. Equation (11) corresponds precisely to
equation (6), § 13, of Chap. XIV.

As was pointed out in that chapter, the case when » is positive
covers the case discussed under (a): for if we draw successive
samples from different records, such a positive correlation is at
once introduced, although the drawings of the several cards a?
each sampling are quite independent of one another. Similarly,
the case discussed under (6) is covered by the case of negative
correlation, for if each card is always drawn from a separate and
distinct part of the record, the correlation between any two «’s will
on the average be negative : if some one card be always drawn
from a part of the record containing low values of the variable,
the others must on an average be drawn from parts containing
relatively high values. It is as well, however, to keep the cases
(a), (8), and (c) distinct, since a positive or negative correlation
may arise for reasons quite different from those considered under
(a) and (b).

15. With this discussion of the standard error of the arithmetic
mean we must bring the present work to a close. To indicate
briefly our reasons for not proceeding further with the discussion
of standard errors, we must remind the student that in order to
express the standard error of the mean we require to know, in
addition to the mean itself, the standard-deviation about the mean,
or. in other words, the mean (deviation)? with respect to the mean.

2h
        <pb n="377" />
        XVIL—SIMPLER CASES OF SAMPLING FOR VARIABLES. 351
Similarly, to express the standard error of the standard-deviation
we_ require to know, in the general case, the mean (deviation)?
with respect to the mean. Either, then, we must find this quantity
for the given distribution—and this would entail entering on a
field of work which hitherto we have intentionally avoided—or we
must, if that be possible, assume the distribution to be of such a
form that we can express the mean (deviation) in terms of the
mean (deviation)?. This can be done, as a fact, for the normal
distribution, but the proof would again take us rather beyond
the limits that we have set ourselves. To deal with the standard
error of the correlation coefficient would take us still further
afield, and the proof would be laborious and difficult, if not
impossible, without the use of the differential and integral cal-
culus. We must content ourselves, therefore, with a simple
statement of the standard errors of some of the more important
constants,

Standard-deviation.—]If the distribution be normal,

standard error of the o
standard-deviation in \ = i (12)
a normal distribution Van

This is generally given as the standard error in all cases: it is,

however, by no means exact : the general expression is

standard error of the standard- 1
deviation in a dein = J fy 14 (13)

of any form py. m
where pu, is the mean (deviation)*—deviations being, of course,
measured from the mean—and Py the mean (deviation)? or the
square of the standard-deviation: n is assumed sufficiently large
to make the errors in the standard-deviation small compared with
that quantity itself. Equation (13) may in some cases give
values considerably ‘oreater—twice as great or more—than (12).
(Cf. ref. 17.) If, however, the distribution be normal, equation
(12) gives the standard error not merely of standard-deviations of
order zero, to use the terminology of Chap. XII, but of standard-
deviations of any order (ref. 33). It will be noticed, on reference
to equation (4) above, § 8, that the standard error of the standard.
deviation is less than that of the semi-interquartile range for a

normal distribution.
For a normal distribution, again, we have—
standard error of the co- 24 v \2)1#

efficient of variation a Bo) 1+ (150) } - (14)
        <pb n="378" />
        THEORY OF STATISTICS.
The expression in the bracket is usually very nearly unity, for
a normal distribution, and in that case may be neglected.
Correlation coefficient.—If the distribution be normal,
standard error of the cor- | Te
relation coefficient for Se : (15)
a normal distribution ad
This is the value always given: the use of a more general formula
which would entail the use of higher moments does not appear
to have been attempted. As regards the case of small samples,
cf. refs. 10, 28, and 31. Equation (15) gives the standard error
of a coeflicient of any order, total or partial (ref. 33). For the
standard error of the correlation-coefficient for a fourfold table
(Chap. XI., § 10), see ref. 34: the formula (15) does not apply.
Coefficient of regression.—If the distribution be normal,
standard error of the co- EE

efficient of regression 4, » =T2LN- "Tie T12_ (16)

for a normal distribution 0) Jn 0, Nn
This formula again applies to a regression coefficient of any order,
total or partial: ¢.e. in terms of our general notation, £ denoting
any collection of secondary subscripts other than 1 or 2,

standard error of by, for | im
a normal distribution | =o, A/n.

Correlation ratio.—The general expression for the standard
error of the correlation-ratio is a somewhat complex expression
(¢f. Professor Pearson’s original memoir on the correlation-ratio,
ref. 18, Chap. X.). In general, however, it may be taken as
given sufficiently closely by the above expression for the standard
error of the correlation coefficient, that is to say,

standard error of correlation- | _ 1-7? (17)
ratio approximately dey :
As was pointed out in Chap. X,, § 21, the value of {=72—-1%is a
test for linearity of regression. Very approximately (Blakeman,
ref. 1),
standard error of {= Wi JA -p2)2-(1-r22+1. (18)
n
For rough work the value of the second square root may be
taken as nearly unity, and we have then the simple expression,
standard error of { roughly = 2 a 19)

352
\ a
        <pb n="379" />
        XVIL—SIMPLER CASES OF SAMPLING FOR VARIABLES. 353
“To convert any standard error to the probable error multiply by
the constant 0-674489 , . . .

16. We need hardly restate once more the warnings given in
Chap. XIV., and repeated in § 9 above, that a standard error can
give no evidence as to the biassed or representative character of
a sample, nor as to the magnitude of errors of observation, but
we may, in conclusion, again emphasise the warnings given
in §§ 1-3, Chap. XIV,, as to the use of standard errors when
the number of observations in the sample is small.

In the first place, if the sample be small, we cannot in general
assume that the distribution of errors is approximately normal :
it would only be normal in the case of the median (for which
» and ¢ are equal) and in the case of the mean of a normal
distribution. Consequently, if = be small, the rule that a
range of three times the standard error includes the majority
of the fluctuations of simple sampling of either sign does not
strictly apply, and the “probable error” becomes of doubtful
significance.

Secondly, it will be noted that the values of o and Y, in (1), of
Jn (2), and of o in (4) and (5), ie. the values that would be
given for these constants by an indefinitely large sample drawn
under the same conditions, or the values that they possess in
the original record if the sample is unbiassed, are assumed to be
known a priori. But this is only the case in dealing with the
problems of artificial chance: in practical cases we have to use
the values given us by the sample itself. If this sample is based
on a considerable number of observations, the procedure is safe
enough, but if it be only a small sample we may possibly mis-
estimate the standard error to a serious extent. Following the
procedure suggested in Chap. XIV., some rough idea as to the
possible extent. of under-estimation or over-estimation may be
obtained, e.g. in the case of the mean, by first working out the
standard error of o on the assumption that the values for the
necessary moments are correct, and then replacing o in the
expression for the standard error of the mean by o + three times
its standard error so obtained.

Finally, it will be remembered that unless the number of
observations is large, we cannot interpret the standard error of
any constant in the inverse sense, 7.e. the standard error ceases
to measure with reasonable accuracy the standard-deviation of
true values of the constant round the observed value (Chap.
XIV. § 3). If the sample be large, the direct and inverse
standard errors are approximately the same.

23
        <pb n="380" />
        THEORY OF STATISTICS.
REFERENCES.

The probable errors of various special coefficients, etc., are generally dealt
with in the memoirs concerning them, reference to which has been made in
the lists of previous chapters: reference has also been made before to most of
the memoirs concerning errors of sampling in proportions or percentages.
The following is a classification of some of the memoirs in the list below :-—

General : 18, 20.

Theory of fit of two distributions: 9, 19, 23.

Averages and percentiles: 5, 6, 7, 30, 32, 35, 36.

Standard deviation: 17, 26.

Coefficient of correlation (product-sum and partial correlations): 10,
12, 13, 28, 31, 33, 34.

Coefficient of correlation, other methods, normal coefficient, ete. : 24, 29.

Cocfficients of association: 34.

Coefficient of contingency: 2, 25.

As regards the conditions under which it becomes valid to assume that the
distribution of errors is normal, ¢f. ref. 14.

(1) BLAREMAN, J., “On Tests for Linearity of Regression in Frequency
Distributions,” Biometrika, vol. iv., 1905, p. 332.

(2) BLAKEMAN, J., and KARL PEARSON, ‘On the Probable Error of the
Coefficient of Mean Square Contingency,” Biometrika, vol. v., 1906,
pol.

(3) BowLEY, A. L., The Measurement of Groups and Sertes ; C. &amp; E. Layton,
London, 1903.

(4) BowLEYy, A. L., Address to Section I of the British Association, 1906.

(5) EpcEworTH, F. Y., “Observations and Statistics: An Essay on the
Theory of Errors of Observation and the First Principles of Statistics,”
Cambridge Phil. Trans., vol. xiv., 1885, p. 139.

(6) EpceworrH, F. Y., ¢ Problems in Probabilities,” Phil. Mag., 5th Series,
vol. xxii., 1886, p. 371.

(7) EpceEworrtH, F, Y., ‘The Choice of Means,” Phil. Mag., 5th Series,
vol. xxiv., 1887, p. 268.

(8) EpcewortH, F. Y., “On the Probable Errors of Frequency Constants,”
Jour. Roy. Stat. Soc., vol. lxxi., 1908, pp. 381, 499, 651; and
Addendum, vol. Ixxii., 1909, p. 81.

(9) EupErTON, W. PALIN, ‘Tables for Testing the Goodness of Fit of Theory
to Observation,” Biometrika, vol. i., 1902, p. 155.

(10) FisHER, R. A., ‘‘The Frequency Distribution of the Values of the
Correlation Coefficient in Samples from an Indefinitely large Popula-
tion.” Biometrika, vol. x., 1915, p. 507.

(11) GresoN, WINIFRED, ‘‘Tables for Facilitating the Computation of
Probable Errors,” Biometrika, vol. iv., 1906, p. 385.

(12) Heron, D., ““ An Abac to determine the Probable Errors of Correlation
Coefficients,” Biometrika, vol. vii., 1910, p. 411. (A diagram giving
the probabl« error for any number of observations up to 1000.)

(18) Heron, D., “On the Probable Error of a Partial Correlation Coefficient,”
Biometrika, vol. vii., 1910, p. 411. (A proof, on ordinary algebraic
lines, for the case of three variables, of the result given in (33).)

(14) IssErLIs, L., “On the Conditions under which the ‘ Probable Errors’ of

» Frequency Distributions have a real Significance,” Proc. Roy. Soc.,

Series A, vol. xeii., 1915, p. 23.

(15) LAPLACE, PIERRE SIMON, Marquis de, Théorie des probabilités, 2¢ édn.,
1814. (With four supplements.)

(16) PrArL, RayMoND, ‘‘The Calculation of Probable Errors of Certain
Constants of the Normal Curve,” Biometrika, vol. v., 1906, p. 190.

254,
        <pb n="381" />
        XVIL.—SIMPLER CASES OF SAMPLING FOR VARIABLES. 355

(17) PearL, RAYMOND, “On certain Points concerning the Probable Error
of the Standard-deviation,” Biometrika, vol, vi., 1908, p. 112. (On

F the amount of divergence, in certain cases, from the standard error
o/~/2n in the case of a normal distribution.)

(18) PEARSON, KARL, and L. N. G. Fox, “On the Probable Errors of
Frequency Constants, and on the Influence of Random Selection on
Variation and Correlation,” Phil. Trans. Roy. Soc., Series A, vol. cxei.,
1898, p. 229,

(19) PEARrsoN, KARL, ‘“On the Criterion that a given System of Deviations
from the Probable in the Case of a Correlated System of Variables is
such that it can be reasonably supposed to have arisen from Random
Sampling,” Phil. Mag., 5th Series, vol. 1., 1900, p. 157.

(20) PrarsoN, KarL, and others (editorial), “On the Probable Errors of
Frequency Constants,” Biometrika, vol. ii., 1903, p. 273, and vol. ix.,
1913, p. 1. (Useful for the general formule given, based on the
general case without respect to the form of the frequency-distribution. )

(21) Pearson, KARL, ““ On the Curves which are most suitable for describing
the Frequency of Random Samples of a Population,” Biometrika, vol.
v., 1906, p. 172.

(22) PEARsoN, KARL, “Note on the Significant or Non-significant Character
of a Sub-sample drawn from a Sample,” Biometrika, vol. v., 1906,

e181,

(23) Pains, KARL, “On the Probability that two Independent Distribu-
tions of Frequency are really Samples from the same Population,”
Biometrika, vol. viii., 1911, p. 250, and vol. x., 1914, p. 85.

(24) PEARSON, KARL, “On the Probable Error of a Coefficient of Correlation
as found from a Fourfold Table,” Biometrika, vol. ix., 1913, P: 22.

(25) Pearson, KARL, “On the Probable Error of a Coefficient of Mean
Square Contingency,” Biometrika, vol. x., 1915, p. 590.

(26) REIND, A., “Tables for Facilitating the Computation of Probable Errors
of the Chief Constants of Skew Frequency-distributions,” Biometrika,
vol. vii., 1909-10, p. 127 and p. 386.

(27) SuEPPARD, W. F., ““On the Application of the Theory of Error to Cases
of Normal Distribution and Normal Correlation,” Phil. Trans. Roy.
Soc., Series A, vol. excii., 1898, p- 101.

(28) Soper, H. E., “On the Probable Error of the Correlation Coefficient
to a Second Approximation,” Biometrika, vol. ix., 1913, p. 91.

(29) Soper, H. E., “On the Probable Error of the Bi-serial Expression for
the Correlation Coefficient,” Biometrika, vol. X., 1914, p. 384.

(80) “STUDENT,” “On the Probable Error of a Mean,” Biometrika, vol. vi.,
1908, p. 1. (The standard error of the mean in terms of the standard
error of the sample, )

(31) “STUDENT,” “On the Probable Error of a Correlation Coefficient,”
Biometrika, vol. vi., 1908, p. 302. (The problem of the probable error
with small samples, )

(32) “STUDENT,” “On the Distribution of Means of Samples which are not
drawn at Random,” Biometrika, vol. vii., 1909, p. 210.

(33) YuLe, G. U., “On the Theory of Correlation for any number of Vari-
ables treated by a New System of N otation,” Proc. Roy. Soc., Series
A, vol. Ixxix., 1907, p. 182. (See pp. 192-3 at end.)

(34) YULE, G. U., “On the Methods of Measuring Association between two
Attributes,” Jour. Roy. Stat. Soc., vol. Ixxvi., 1912. (Probable error
of the correlation coefficient for a fourfold table, of association co-

efficients, ete.)
Reference may also be made to the following, which deal for the
most part with the effects of errors other than errors of sampling : —
        <pb n="382" />
        - THEORY OF STATISTICS.
(85) BowLEy, A. L., ‘‘Relations between the Accuracy of an Average and
that of its Constituent Parts,” Jour. Roy. Stat. Soc., vol. 1x., 1897,
p- 855.
(36) BowLey, A. L., “The Measurement of the Accuracy of an Average,”
Jour. Roy. Stat. Soc., vol. 1xxv., 1911, p. 77.
EXERCISES.

1. For the data in the last column of Table IX., Chap. VI. p. 95, find
the standard error of the median (154°7 lbs. ).

9. For the same distribution, find the standard errors of the two quartiles
(142°5 lbs., 168-4 1bs.).

3. For the same distribution, find the standard error of the semi-inter-
quartile range.

4. The standard-deviation of the same distribution is 213 lbs. Find the
standard error of the mean, and compare its magnitude with that of the
standard error of the median (Qn. 1).

5. Work out the standard error of the standard deviation for the distribu-
tion of statures used as an illustration in § 6. (Standard-deviation 2°57 in. ;
8585 observations.) Compare the ratio of standard error of standard-
deviation to the standard deviation, with the ratio of the standard error of
the semi-interquartile range to the semi-interquartile range, assuming the
distribution normal.

6. Calculate a small table giving the standard errors of the correlation
coefficient, based on (1) 100, (2) 1000 observations, for values of r=0, 0:2, 0°4,
06. 0°8. assuming the distribution normal.

356
        <pb n="383" />
        APPENDIX 1.
TABLES FOR FACILITATING STATISTICAL WORK.
A. CALCULATING TABLES.
For heavy arithmetical work an arithmometer 18, of course,
invaluable ; but, owing to their cost, arithmetic machines are, as a
rule, beyond the reach of the student. For a great deal of simple
work, especially work not intended for publication, the student
will find a slide-rule exceedingly useful : particulars and prices
will be found in any instrument maker's catalogue. A plain
25-cm. rule will serve for most ordinary purposes, or if greater
accuracy is desired, a 50-cm. rule, a Fuller spiral rule, or one of
Hannyngton-pattern rules (Aston &amp; Mander, London), in which
the scale is broken up into a number of parallel segments, may be
preferred. For greater exactness in multiplying or dividing,
logarithms are almost essential : five-figure tables suffice if answers
are only desired true to five digits ; if greater accuracy is needed,
seven-figure tables must be used. It is hardly necessary to cite
special editions of tables of logarithms here, but attention may
perhaps be directed to the recently issued eight-figure tables of

Bauschinger and Peters (W. Engelmann, Leipzig, and Asher &amp; Co.,

London, 1910; vol. i. containing logarithms of all numbers from

1 to 200,000, price 18s. 6d. net.; vol. ii. containing logs. of

trigonometric functions).

If it is desired to avoid logarithms, extended multiplication
tables are very useful. There are many of these, and four of
different forms are cited below. Zimmermann’s tables are inex-
pensive and recommended for the elementary student, Cotsworth’s,
Crelle’s, or Peters’ tables for more advanced work. Barlow's tables
are invaluable for calculating standard-deviations of ungrouped
observations and similar work.

(1) BARLOW’S Tables of Squares, Cubes, Square-roots, Cube-roots, and Recip-
rocals of all Integer Numbers up to 10,000; E. &amp; F. N. Spon,
London and New York; stereotype edition, price 6s,

357
        <pb n="384" />
        . THEORY OF STATISTICS.

(2) CorswortrH, M. B., The Direct Calculator, Series O. (Product table to
1000 x 1000.) M‘Corquodale &amp; Co., London ; price with thumb index,
25s. ; without index, 21s.

(3) CRELLE, A. L., Rechentafeln. (Multiplication table giving all products up
to 1000x1000.) Can be obtained with explanatory introduction in
German or in English. G. Reimer, Berlin ; price 16s.

(4) ELpERTON, W. P. ‘‘Tables of Powers of Natural Numbers, and of the
Sums of Powers of the Natural Numbers from 1 to 100” (gives
powers up to seventh), Biometrika, vol. ii. p. 474.

(5) PETERS, J., Neue Rechentafeln fir Multiplikation und Division. (Gives
products up to 100 x 10,000 : more convenient than Crelle for forming
four-figure products. Introduction in English, French or German.)
G. Reimer, Berlin ; price 15s.

(6) ZIMMERMANN, H., Rechentafel, nebst Sammlung hidufig gebrauchter
Zahlenwerthe. (Products of all numbers up to 100 x 1000 : subsidiary
tables of squares, cubes, square-roots, cube-roots and reciprocals, ete.
for all numbers up to 1000 at the foot of the page.) W. Ernst &amp; Son,
Berlin ; price 5s. ; English edition, Asher &amp; Co., London, 6s.

B. SPECIAL TABLES OF FUNCTIONS, ETC.
Several tables of service will be found in the works cited in

Appendix II, e.g., a table of Gamma Functions in Elderton’s

book (12) and a table of six-figure logarithms of the factorials

of all numbers from 1 to 1100 in De Morgan’s treatise (11). The
majority of the tables in the list below, which were originally
published in Biometrika, together with others, are contained in

Tables for Statisticians and Biometricians, Part 1., edited by Karl

Pearson (Cambridge University Press, price 15s. net).

(7) Davenreort, C. B., Statistical Methods, wilh especial reference to Bio-
logical Variation; New York, John Wiley; London, Chapman &amp;
Hall; second edition, 1904. (Tables of area and ordinates of the
normal curve, gamma functions, probable errors of the coefficient of
correlation, powers, logarithms, ete.)

(8) DUFFELL, J. H., “Tables of the Gamma-function,” Biometrika, vol. vii.,
1909, p. 43. (Seven-figure logarithms of the function, proceeding by
differences of 0001 of the argument.)

(9) ELpErTON, W. P., “Tables for Testing the Goodness of Fit of Theory to
Observation,” Biometrika, vol. i., 1902, p. 155.

(10) Everitt, P. F., ‘Tables of the Tetrachoric Functions for Four-
fold Correlation Tables,” Biometrika, vol. vii., 1910, p. 437, and vol.
viii., 1912, p. 385. (Tables for facilitating the calculation of the cor-
relation coefficient of a fourfold table by Pearson’s method on the
assumption that it is a grouping of a normally distributed table; cf.
ref. 14 of Chap. XVI.)

(11) GiBsoN, WINIFRED, ‘‘ Tables for Facilitating the Computation of Prob-
able Errors,” Biometrika, vol. iv., 1906, p. 385.

(12) HEroN, D., ¢“ An Abac to determine the Probable Errors of Correlation
Coefficients,” Biometrika, vol. vii., 1910, p. 411. (A diagram giving
the probable error for any number of observations up to 1000.)

(13) Lee, ALICE, ‘‘ Tables of F(r, v) and H(r, v) Functions,” British Associa-
tion Report, 1899. (Functions occurring in connection with Professor
Pearson’s frequency curves.)

25Q
        <pb n="385" />
        APPENDIX I—SPECIAL TARLES OF FUNCTIONS, ETC. 859

(14) LEE, Aric, ‘‘Tables of the Gaussian ‘Tail-functions,” when the ‘tail’
is larger than the body,” Biometrika, vol. x., 1914, p. 208.

(15) RuixD, A., ‘Tables for Facilitating the Computation of Probable Errors
of the Chief Constants of Skew Frequency-distributions,” Biometrika,
vol. vii., 1909-10, p. 127 and p. 386.

(16) SuEPPARD, W. F., ‘“ New Tables of the Probability Integral,” Biometrika,
vol. ii., 1903, p. 174. (Includes not merely table of areas of the normal
curve (to seven figures), but also a table of the ordinates to the same
degree of accuracy.)

(17) Saepparp, W. F., “Table of Deviates of the Normal Curve” (with
introductory article on Grades and Deviates by Sir Francis Galton),
Biometrika, vol. v., 1907, p. 404. (A table giving the deviation of
the normal curve, in terms of the standard-deviation as unit, for the
ordinates which divide the area into a thousand equal parts.)

A number of useful tables will be found in the series “Tracts
for Computers,” published by the Cambridge University Press for
the Department of Applied Statistics, University College, London.
A list is usually given in the advertisement pages of the current
issue of Biometrika.

A Part II. of Tables for Statisticians is announced for issue in
the near future.
        <pb n="386" />
        APPENDIX II.

SHORT LIST OF WORKS ON THE MATHEMATICAL
THEORY OF STATISTICS AND THE THEORY OF
PROBABILITY.

THE student may find the following short list of service, as

supplementing the lists of references given at the ends of the

several chapters, the latter containing, as a rule, original memoirs
only. The economic student who wishes to know more of the
practical side of statistics may be referred to Mr A. L. Bowley’s

“ Elements” (6 below), to An Elementary Manual of Statistics

(3rd edn, P. S. King &amp; Sons, London, 1925), by the same writer

(useful as a general guide to English statistics), and to M. Jacques

Bertillon’s Cours élémentaire de statistique (Société d’éditions

scientifiques, 1895: international in scope). Dr A. Newsholme’s

Vital Statistics (Swan Sonnenschein, 3rd edn., 1899) will also be

of service to students of that subject.

The great majority of the works mentioned in the following
list, with others which it has not been thought necessary to
include, are in the library of the Royal Statistical Society.

(1) Ary, Sir G. B., On the Algebraical and Numerical Theory of Errors of
Observations ; 1st edn., 1861 ; 8rd edn., 1879.

(2) BerNouLLL, J., Ars conjectands, opus posthumum: Accedit tractatus de
seriebus infinitis, et epistola gallicé scripta de ludo pilae reticularis,
1713. (A German translation in Ostwald’s Klassiker der exakten

Wissenschaften, Nos, 107, 108.)

(3) BERTRAND, J. L. F., Calcul des probabilités ; Gauthier- Villars, Paris, 1889.

(4) Betz, W., Ueber Korrelation ; Beihefte zur Zeitschrift fiir ang. Psych.
und psych. Sammelforschung ; J. A. Barth, Leipzig, 1911. (Applica
tions to psychology.)

(5) BorzL, E., ja de la théorie des probabilités ; Hermann, Paris, 1909.

(6) BowLEY, A. L., Elements of Statistics; P. S. King, London ; 1st edn.,
1901 ; 3rd edn., 1907.

(7) Brown, W., The Essentials of Mental Measurement; Cambridge Uni-
versity Press, 1911. (Part 2 on the theory of correlation : applications
to experimental psychology.)

(8) Bruns, H., Wahrscheinlichkeitsrechnung und Kollektivmossiehre ;
Teubner, Leipzig, 1906.
360
        <pb n="387" />
        APPENDIX IL—SHORT LIST OF WORKS. L
(9) Cournor, A. A., Exposition de la théorie des chances et des probabilités,
1843.

(10) Czuskr, E., Wahrscheinlichkeitsrechnung und ihre Anwendung auf
Fehlerausgleichung, Stotistik und Lebensversicherung ; Teubner,
Leipzig, 2nd edn., vol. i., 1908-10.

(11) DE MORGAN, A., Treatise on the Theory of Probabilities (extracted from
the Encyclopedia Metropolitana), 1837.

(12) ELpkrroN, W. P., Frequency Curves and Correlation ; C. &amp; E. Layton,
London, 1906. (Deals with Professor Pearson’s frequency curves and
correlation, with illustrations chiefly of actuarial interest.)

(13) FecuxER, G. T., Kollektivmassiehre (posthumously published ; edited
by G. F. Lipps) ; Engelmann, Leipzig, 1897.

(14) GaLLowAy, T., Treatise on Probability (republished from the 7th edn.
of the Encyclopedia Britannica), 1839.

(15) Gauss, C. F., Méthode des moindres carrés: Mémoires sur la combinazson
des observations, traduits par J. Bertrand, 1855.

(16) JoHANNSEN, W., Elemente der exakten Erblichkeitslehre ; Fischer, Jena,
te Ausgabe, 1913. (Very largely concerned with an exposition of the
statistical methods.)

(17) LarrLACE, PIERRE SIMON, Marquis de, Essai philosophique sur les
probabilités, 1814. (The introduction to 18, separately printed with
some modifications.)

(18) LAPLACE, PIERRE S1MON, Marquis de, Théorie analytique des probabilités ;
2nd edn., 1814, with supplements 1 to 4.

(19) Lexis, W., Abhandlungen zur Theorie der Bevilkerungs- und Moral-
statistik ; Fischer, Jena, 1903,

(20) Poixcarg, H., Calcul des probabilités ; Gauthier-Villars, Paris, 1896.

(21) PorssoN, S. D., Recherches sur la probabililé des jugements en matiére
eriminelle et en matiere civile, précédées des régles générales du calcul
des probabilatés, 1837. (German translation by C. H. Schnuse, 1841.)

(22) QUETELET, L. A. J., Lettres sur la théorie des probabilités, appliquée aux
sciences morales et politiques, 1846. (English translation by O. G.
Downes, 1849.)

(23) THORNDIKE, E. L , An Introduction to the Theory of Mental and Social
Measurements, Science Press, New York, 1904.

(24) VENN, J., The Logic of Chance: an Essay on the Foundations and
Province of the Theory of Probability, with especial reference to its
Logical Bearings and its Application to Moral and Social Science and to
Statistics ; 3rd edn., Macmillan, London, 1888.

(25) WesTERGAARD, IL., Die Grundziige der Theorie der Statistik ; Fischer,
ena, 1890.

36°
        <pb n="388" />
        SUPPLEMENTS.
I. NOTES SUPPLEMENTARY TO CHAPTER VI.

6. Position of Intervals.—It is said in the text that in some
exceptional cases the observations exhibit a marked clustering
round certain values. The word exceptional should hardly have
been used. Whenever there is some doubt as to the final digit
in reading a scale, scope is given to the idiosyncrasies of the
observer and the distribution of frequency over the final digits
is rarely uniform. The most conspicuous feature is usually the
tendency to round off to the nearest unit, thus making 0 the
most frequent final digit, but 5s may also be emphasised if
emphasised on the scale itself, and the excesses of 0’s and 5s
may be drawn in the most diverse ways from the other parts of
the scale.

TABLE A.—Frequency-distributions of Final Digits tn Measurements by

Four Observers,
Frequency of Final Digit per 1000.
Final Digit.
a 2, 0.
0 158 122 231 358
: 97 98 37 49
125 98 | 80 90
73 90 72 63
76 100 55 37
71 112 222 211
90 98 | 71 62
56 99 75 70
126 101 72 44
129 81 65 16
Total : J 0
Actual ob- )
servations

A. B ¢ 1;
1001 999 100t 100+
1258 3000 1000 1000

262
        <pb n="389" />
        SUPPLEMENTS—NOTES SUPPLEMENTARY TO CHAPTER VI. 363

Table A shows results for four observers as illustrations, the
frequencies being reduced for comparability to a total of 1000.
Column A is based on measures by myself, on drawings, to the
nearest tenth of a millimetre. It is recognised, of course, that
measures cannot really be made to such a degree of precision ;
but I believed that I was making them carefully, and as they
were made with a Zeiss scale, in which the divisions are ruled
on the under side of a piece of plate-glass, readings are unaffected
by parallax. Nevertheless it will be seen that I heavily over-
emphasised the zeros, and also 2, 8 and 9—an odd selection of
preferences! On the whole, the centre of the millimetre was
neglected and measures piled up at the two ends.

The data for columns B, C, and D were all drawn from the
same published report, and refer to sundry head measurements
taken on the living snbject. Guided by a statement in the intro-
duction, it was possible to compile the data separately for the
three assistants (B, C, D) who had done the actual measuring.
It will be seen that B was rather good : there is a relatively slight
excess at 0 and 5, but otherwise his measurements are fairly
uniformly distributed. C was decidedly not good, rounding off
nearly one measurement in two to the nearest centimetre or
half-centimetre. D was simply outrageously bad—so bad that
it might have been better not to publish his measurements.
Nearly 57 per cent. of his measurements are made only to the
nearest centimetre or half centimetre—a quite inadequate degree
of precision for head measurements often only a few centimetres
in magnitude.

Compilation of data in the form of Table A is recommended as
some control of their value, and as a check on assistants.

15. The Extremely Asymmetrical or J-shaped Distribution.—
Dr J. C. Willis has shown that any number of illustrations of
this form of distribution may be obtained by compiling the
frequency distribution for numbers of genera with 1, 2, 3 . . .
species in any biological group. Table B shows the distribution
for the Chrysomelid beetles.

[TaBLE
        <pb n="390" />
        364
62°

THEORY OF STATISTICS.

TABLE B.—Chrysomelidee (beetles). Numbers of General withil,09 R38
Species. (Compiled by Dr J. C. Willis, F.R.S. ; cited from G. U. Yule,
‘A Mathematical Theory of Evolution based on the Conclusions of Dr
J. C. Willis,” Phil. Trams., B, vol. cexiii. 1924, p. 85.

Species. Genera. Species. Genera. Species. Genera.
1 215 32 74 !
90 33 76 1
38 34 77 :
35 35 79 :
21 36 &amp;3
ie 7 : x4
HH 23 /
: z 29
4 i 92
10 = ¢ 93
oy 110
: 4. 114
i 115
43 128
49 132
50 : 133
52 146
h3 163
; 3 196
2 3 217
Z J 227
22 264
o 327
, 399
oo 417
A 4 681
L7 Lud
28 : 71
29 v 72 Total 7
30 3 va
        <pb n="391" />
        SUPPLEMENTS — FORMULA FOR REGRESSIONS.
II. DIRECT DEDUCTION OF THE FORMULZE
FOR REGRESSIONS.
(Supplementary to Chapters I1.X. and XI1.)
To those who are acquainted with the differential calculus the
following direct proof may be useful. It is on the lines of the
proof given in Chapter XII. § 3.
Taking first the case of two variables (Chapter IX.), it is
required to determine values of a, and by in the equation
T=a; +b .y
(where = and y denote deviations from the respective means)
that will make the sum of the squares of the errors like
u=z'—a, +b; .y
a minimum, 2’ and y’ being a pair of associated deviations.
The required equations for determining a, and ?; will be given
by differentiating
2?) =3(x-a, +b, .y)&gt;
with respect to a; and to &amp;; and equating to zero,
Differentiating with respect to a,, we have
S(z—a,+b;.y)=0.
But 3(x) =3(y) =0,
and consequently we have a,=0,
Dropping a,, and differentiating with respect to b,,
3(xz—b,.y)y=0.
: (zy) ©
That is, byw or = OE
U3) ey,
as on p. 171.
Similarly, if we determine the values of a, and 5, in the
equation
y=a,+bx
that will make the sum of the squares of the errors like
v=y —ay+b,.x
a minimum, we will find
a,=0
Le Sry) +2
Et 2) oy

365
        <pb n="392" />
        THEORY OF STATISTICS.

Tf, as in Chapter XII. §§ 4 et seq. (¢f. especially § 7), a number
of variables are involved, the equations for determining the
coefficients will be given by differentiating

Sabine, pe Tat Dim my 2,2
with respect to each coefficient in turn and equating the result to
zero. This gives the equations of the form there stated. If a
constant term be introduced, its “least square” value will be
found to be zero, as above.
III. THE LAW OF SMALL CHANCES.
(Supplementary to Chapter XV.)

WE have seen that the normal curve is the limit of the binomial
(p +g)" when = is large and neither p nor ¢ very small. The
student’s attention will now be directed to the limit reached
when either p or ¢ becomes very small, but n is so large that
either np or ng remains finite.

Let us regard the n trials of the event, for which the chance of
success at each trial is p, s made up of m +m’ =n trials; then
the probability of having at least m successes in the m +m’
trials is evidently the sum of the m'+1 terms of the expansion
of (p+¢)® beginning with p™ But this probability, which we
may term P,,, can be expressed in another and more convenient
form with the help of the following reasoning. The required
result might happen in any one of m+ 1 ways. For instance :—

(a) Each of the first mm trials might succeed; the chance of
this is p™.

(3) The first m 41 trials might give m successes and 1 failure,
the latter not to happen on the (m + 1)™ trial (a condition already
covered by (a)). But the probability of m successes and 1 failure,
the latter at a specified trial, is p™. ¢, and, as the failure might
occur in any one of m out of m + 1 trials, the complete probability
of (0) is mp™. q.

(¢) The first m + 2 trials might give m successes and 2 failures,
the (m + 2) trial not to be a failure (so as to avoid a repetition
of either of the preceding cases); the probability of this is

m(m+1) mn
aye

In a similar way we find for the contribution of m+ 3 trials,
giving m successes and 3 failures,

m+ 1) +2) ns

366
oF
        <pb n="393" />
        SUPPLEMENTS—THE LAW OF SMALL CHANCES.
Ultimately we reach
2, =| 1 +mg +741 Dee. arti... tn ypu'=1) = Gus 02 z ed 247)
This expression is of course equivalent to the first m'+ 1 terms of
the binomial expansion beginning with p™, as the student can
verify. For instance, if m =n — 2, so that m' = 2, we have
7d #la-2g +2221 0s - ge]
= pn-2 2 n-2 n(n—1) n-2,2
=p" (1=q)* + np (1-9) + =r 7. 2
=p" + nph-lq %s 2: L)
Let us now suppose that ¢ is very small, so that &gt; = ratio of
n
failures to total trials is also very small. Let us also suppose
that n is so large that ng =A is finite. Writing ¢ — A and putting
n
m=mn—m', (T) becomes
A\ A= \2 3 Am
(1-2) (1-3) erat oy . 2
since Z* and smaller fractions can be neglected.
n
But ( 1- Ay is shown in books on algebra to be equal to e-A,
n
where e is the base of the natural logarithms, when = is infinite
and, under similar conditions,
(1-2-1
n
Hence, if n be large and ¢ small, we have
x2 ing Am
=e=N
Pp=e (tere 342+ a =) J Ss
If we put m'=0, we have the chance that the event succeeds
every time, and (8) reduces to e-A. Put m'=1, and we get the
chance that the event shall not fail more than once, e=A(1 + A), 80
that e-*,X is the chance of exactly one failure, and the terms

367
        <pb n="394" />
        THEORY OF STATISTICS.
within the bracket give us the proportional frequencies of 0, 1, 2,
etc. failures. In other words, (8) is the limit of the binomial
(p+ ¢)* when ¢ is very small but ng finite.

The investigation contained in the preceding paragraphs was
published in 1837 by Poisson, so that (8) may be termed Poisson’s
limit to the binomial ; but the result has been reached indepen-
dently by several writers since Poisson’s time, and we shall give
one of the methods of proof adopted by modern statisticians, which
the student may perhaps find easier to follow than that of Poisson
(see ref. 19, p. 273).

x 2
(par =(-g+oy=Q-gp(1+ 2). ©
The first bracket on the right is equal to e=* when ¢ is inde-
finitely small. Expanding the second bracket, we have
A/A
Ag oe E 1) 7 \
143.00 00 JSF.
HET ar (Z x
The ratio of the (r+ 1)™ to the 7 term is
2 Rl
gt + (9a)
1-9 2
which reduces to 2 when ¢ is very small. The convergence of
gq
the series is seen from the fact that » cannot exceed &gt; and the
substitution of this value in (9a) reduces it to
g?
(1-g\
which vanishes with gq.
Hence the second bracket on the right of (9) may be written
X23 y
(1 + A+ Tit Tide
and (9) is
; AZ As
e (T+d4g ++ o (was
identical with (8).

268
        <pb n="395" />
        SUPPLEMENTS—THE LAW OF SMALL CHANCES. 729

The frequent rediscovery of this theorem is due to the fact that
its value is felt in the study of problems involving small, inde-
pendent probabilities. For instance, if we desired to find the
distribution of = things in IV pigeon-holes (all the pigeon-holes
being of equal size and equally accessible), I being large, the dis-
tribution given by the binomial

1, ¥-1}
7+)
FTF
would be effectively represented by (8), tables of which for
different values of A have been published by v. Bortkewitsch and
others.

The theorem has also been applied to cases in which, although
the actual value of ¢ (or p) is unknown, it may safely be assumed
to be very small. It should be noticed that, if (8) is the real law
of distribution, certain relations must obtain between the con-
stants of the statistics (see par. 12, Chapter XIIL). Using the
method of par. 6, Chapter XV., we have for the mean

23
eM A +A+ 2 pF ed
2!
2 :
=re A(1+A+ 2+ eta ud
21
=A
and for o?
3
eA(A+200 +30 ‘on ) - AZ
. AZ A A 222 3A g
=e SYRER EL. LY pe (A +2020 ceee)=A
2
=A+e-M(1 Xe 2 piri x
2!

=A+A2-A2
= A.

Hence any statistics produced by causes conforming to Poisson’s

limit should, within the limits of sampling, have the mean equal

to the square of the standard deviation. For instance, in the

statistics used in par. 12 of Chapter XIII, the mean is ‘61,

a="78, ¢2="6079,

2%
24
        <pb n="396" />
        ER THEORY OF STATISTICS.

If we now compute the theoretical frequencies from (8), putting
A="61, we have the following results :—

Actual Frequency assigned

Deaths: An, by Se Tomit,
0 109 108-7
1 65 66°38
RB 2 202
3 41

1 *7 (4 and over)

The agreement here is excellent, but such a concordance is not
very common in actual statistics. Cases do, however, occur in
which the method is of service, and the advanced student will find
that the reasoning illustrated is of value in many theoretical
investigations.

IV. GOODNESS OF FIT.

(Supplementary to Chapter XVII.)
IN par. 15, Chapter XV. (p. 308), it was remarked that the general
treatment of the problem, whether the discrepancies between
any system of observed frequencies and those postulated by a
theoretical law might have arisen by the operation of simple
sampling, was beyond the scope of this work. As, however, the
student will find in the course of his reading that a test of this
character is often applied in practical problems, the following
notes may be of service by way of comment on, or elucidation
of, the highly technical papers in which the subject is fully
discussed (see refs. 22 and 23, p. 315, and also additional
refs. on p. 394).

The student who has followed the argument leading up to
the table on p. 310 will have perceived that, when the frequency
distribution of a variable is known, the probability that a set of
observations departing from the most likely value would occur
can be evaluated by comparing the portion of area bounded by
the ordinate corresponding to the observed deviation with the
whole area of the theoretical curve, and the work is illustrated
in Examples i.-iv. of pp. 311-313. In this case there is only a
single variable, and the test for goodness of fit is reduced to its
simplest terms. But a consideration of Chapter XVI., and the

370
        <pb n="397" />
        SUPPLEMENTS—GOODNESS OF FIT. 371
relation there shown to hold between the normal curve and the
surface of normal correlation, at once suggests that the same
principle will apply when there are two variables.

It was proved on pp. 319-321 that the contours of a normal
surface are a system of concentric ellipses. Now suppose we
have a normal system of frequency in two variables z and Ys
then the chance that on simple sampling we should obtain the
combination 2’ ' is measured by the corresponding ordinate of
the surface, and the feet of all ordinates of equal height will lie
upon an ellipse which will therefore be the locus of all combina-
tions of z and y equally likely to occur as is z’ y. oF combina-
tion more likely to occur than 2’ will have a talle ordinate,
and as the locus of its foot must also be an ellipse, that ellipse
will be contained within the 2" 3’ ellipse. Conversely, combina-
tions less likely to occur than 2’ 7’ will be represented by
ordinates located upon ellipses wholly surrounding the z' #’
ellipse. Hence, if we dissect the surface into indefinitely thin
elliptical slices and determine the total volumes of the sum of
the slices from az=2' and y=%' down to 2=0 and y=0, this
volume divided by the total volume of the surface will be the
probability of obtaining in sampling a result not worse than
x’ y'; or, if we prefer, we may sum from x=2', y=% to
z=y=ow, and then the fraction is the chance of obtaining as
bad a result as 2" 7, or a worse result.

The reader who has compared the figures on p- 166 and
p- 246, and followed the algebra of pp. 331-332, will have no
difficulty in seeing that, when the number of variables is
3, 4... .m the above principle remains valid although it
ceases to be possible to give a graphic representation. With
three variables the contour ellipse becomes an ellipsoidal surface,
and the four-dimensioned frequency “volume” must be dissected
into tridimensional ellipsoids; with four variables another
dimension is involved, and so on; but throughout the equation
of the contour of equal probability is of the ellipse type (cf. the
generalisation of the theorems of Chapter IX. in Chapter XII).
Let us now suppose that if a certain set of data is derived
from a statistical universe conforming to a particular law, these
data, &amp; in number, should be distributed into n+ 1 groups con-
taining respectively ny, m;,, n, . . . . n, each. Instead of this
we actually find mg, m,, m, . . . .m,, where

myt+m +... a=nytn +... n,=D.
The problem to be solved is whether the observed system of
deviations from the most probable values might have arisen in
        <pb n="398" />
        3. THEORY OF STATISTICS.
random sampling. Since, N being given, fixing the contents of
any n of the classes determines the = + 1th, there are only =
independent variables. Let us now suppose that the distribution
of deviations is normal. Then the equation of the frequency
“solid ” is of the type set out in equation (15) of p. 331, which
we will write for the present in the form

ke ~ 5x2
x® =a constant, is then the equation of the “ellipsoid” delimiting
the two portions of the ‘“volume” corresponding to combina-
tions more or less likely to occur than my, m,, m, . . . . my,
Accordingly, to find the chance of a system of deviations as
probable as or less probable than that observed, we have to
dissect the frequency solid, adding together the elliptic elements
from the ellipsoid x? to the ellipsoid «0, and to divide this
summation by the total volume, ¢.e. the summation from the
ellipsoid 0 to the ellipsoid o.

In this book we have been concerned with summations the
elements of which were finite. The reader is probably aware
that when the element summed is taken indefinitely small the
summation is called an integration, the symbol [ replacing 2 or iS,
and the infinitesimal element being written dz. In the present
case we have to reduce an n-fold integral the summation relating
to n elements dz, dx,, etc. To reduce this n-fold integral to a
single integral, the following method is adopted. In the first
place the ellipsoid, referred to its principal axes, is transformed
into a spheroid by stretching or squeezing, and the system of
rectangular co-ordinates transformed into polar co-ordinates.

The reason for adopting the latter device is that, when twc
rectangular elements dz, dy are transformed to polar co-ordinates,
we replace them by an angular element df, a vectorial element dr,
and a term in 7, the radius vector. When n= such elements are
transformed, the integral vectorial factor is raised to the n» — 1th
power and there is an infinitesimal vectorial element, dr, and a
“solid ” angular element. But as the limits of integration of
the angular (not of the vectorial) element will be the same in
the numerator and denominator, these cancel out, while x may
be treated as the vectorial element or ray. Hence the multiple
integral reduces to a single integral and the expression becomes

oc

( ei po. dx
£
[ gad AT aly
4

79
        <pb n="399" />
        SUPPLEMENTS—GOODNESS OF FIT. 373
the reduction of which, its integration, can be effected in terms
of x by methods described in text-books of the integral calculus.
Everything turns, therefore, upon the computation of the function y.

As we have seen, x? is determined by evaluating the standard
deviations of the n variables and their correlations two at a time
(the higher partials being deducible if the correlations of zero
order are known).

By an application of the method of p. 257, we have

= {1 R2\"p
ts V y1-3)%
for the standard error of sampling in the content of the pth class ;
while by a similar adaptation of the reasoning on p. 342 we reach
J npg
Eps No yo,
for the correlation of errors of sampling in the pt* and ¢'® classes.
With these data, x? can be deduced (the actual process of reduc-
tion is somewhat lengthy, but the student should have no difficulty
in following the steps given in pp. 370-2 of ref. 74, infra). Its
value is
=F nl
n=0 Nin
the summation extending to all n+ 1 classes of the frequency
distribution.

Values of the probability that an equally likely or less likely
system of deviations will occur, usually denoted by the letter
P, have been computed for a considerable range of x? and of
n' =n + 1 =the number of classes, and are published in the Tables
for Statisticians and Biometricians mentioned on p. 358.

The arithmetical process is illustrated upon the two examples
of dice-throwing given on p. 258.

There are three points which the student should note as regards
the practical application of the method. In the first place, the
proof given assumes that deviations from the expected frequencies
follow the normal law. This is a reasonable assumption only if
no theoretical frequency is very small, for if it is very small the
distribution of deviations will be skew and not normal. It is
desirable, therefore, to group together the small frequencies in
the “tail” of the frequency distribution, as is done in the second
illustration below, so as to make the expected frequency a few
units at least. In the case of the first illustration it might have
been better to group the frequency of 0 successes with that of
        <pb n="400" />
        aA THEORY OF STATISTICS.
Twelve Dice thrown 4096 times, a throw of 4, 5, or 6 points re-koned
a success (p. 258).
Expected
Observed P
No. of Frequency ’ (m' —m)?
Successes. F a (m) (m' =m) zon
i 4096(% +3)12
0 0 1 1 1°0000
! 7 12 25 2°0833
60 66 | 36 5455
198 220 484 2°2000
430 495 4225 85354
731 792 3721 4:6982
948 924 576 ‘6234
847 792 3025 38194
536 495 1681 3°3960
257 220 1369 6:2227
1 71 66 25 ‘3788
11 12 i 0833
12 0 1 1°0000
Totals £396 23 ed 34:5860 = y*
From the tables we find :—
n'. x. 2.
13 30 1002792
13 40 ‘000072
Hence, by interpolation for x2=34'5860, P="0015.
Twelve Dice thrown 4096 times, a throw of 6 points reckoned a success.
Expected
No. of el Frequency / 2 (m' =m)?
Frequency (m' —m)% ye iy
Successes. (m) (m) m
: 4096(% + #)'2
n 447 459 144 3137
1145 1103 1764 15993
1181 1213 1024 8442
796 809 x 169 2089
380 364 256 7033
115 116 11 *0086
24 27 J) *3333
7 and over 8] 5 9 1:8000
Totals 3 3 a 58113 ="
From the tables we find :—
n', x&gt; 2
8 5 '659963
8 6 539750
Henee, by interpolation for x2=5'8113, P= 5624.

5 Ps
at 409r
409% 409¢
        <pb n="401" />
        SUPPLEMENTS—GOODNESS OF FIT. R75
1 success, and the frequency of 12 successes with that of 11
successes.

In the second place, the proof outlined assumes that the
theoretical law is known a priorz. In a large number, perhaps
almost the majority, of practical cases in which the test is ap-
plied this condition is not fulfilled. We determine, for example,
the constants of a frequency curve from the observations them-
selves, not from a priori considerations: we determine the
“independence values” of the frequencies for a contingency
table from the given row and column totals, again not from
a priori considerations. This general case is dealt with below,
in the section headed * Comparison Frequencies based on the
Observations.”

Finally, attention should be paid to the run of the signs of
the differences m’—m. The method used pays no attention to
the order of these signs, and it may happen that x* has quite a
moderate value and P is not small when all the positive differences
are on one side of the mode and all the negative differences on the
other, so that the mean shows a deviation from the expected value
that is quite outside the limits of sampling, or that the differences
are negative in both tails so that the standard deviation shows
an almost impossible divergence from expectation. In the first
example on the preceding page all the differences are negative up to
5 successes, positive from 6 to 10 successes, and negative again for
11 and 12 successes. This is almost the first case supposed, and
in fact we have already found (p. 267) that the mean deviates
from the expected value by 5°1 (more precisely 5:13) times its stan-
dard error. From Table II. of Tables for Statisticians we have :—

Greater fraction of the area of a normal
curve for a deviation 5°13 . : . "9999998551
Area in the tail of the curve . . *01)00001449
Area in both tails . : . '0000002898
so that the probability of getting such a deviation (+ or —) on
random sampling is only about 3 in 10,000,000. The value found
for P (0015) by the grouping used is therefore in some degree
misleading. If we regroup the distribution according to the
signs of m"— m, we find
m—— sgt ved Expected
requency Frequency.
0-5 1426 1586
6-10 2659 2497
11-12 1 13
Total. 28 1095

- (aAet
al(3Qv a
        <pb n="402" />
        THEORY OF STATISTICS.

For this comparison n’ is 8, x2 is 26:96, or practically 27, and P
is about *000001—a value much more nearly in accordance with
that suggested by the mean.

Such a regrouping of the frequency distribution by the runs of
classes that are in excess and in defect of expectation would appear
often to afford a useful and severe test of the real extent of agree-
ment between observation and theory. In the second example
the signs are fairly well scattered, and the regrouping has a com-
paratively small effect ; the mean being in almost precise agreement
with expectation. The regrouped distribution is :—

SITCOREeE, a Expected
requency. Frequency.
0 447 459
1 1145 1108
2-3 1977 2022
4 380 364
5-6 139 143
7-8 8 5
Total. . : E
Here nn’ is 6, x*is 5°52, and P 0°36, so that the deviations from
expectation are still well within the range of fluctuations of
sampling.

The value of P is the probability that a set of observations

will occur giving a group of deviations from theory, s.e. a value
of x, which is more improbable than that observed. If, to take
the second illustration above, we were to repeat 4096 throws of
twelve dice a large number of times, noting the throws of sixes,
we should expect to get a worse fit to theory, z.e. a value of x?
greater than 5 81, roughly speaking 56 times in every hundred
trials.
, The value of P corresponding to ¥2=0 ig necessarily unity,
for it is certain that all values of x2 must exceed zero. If the
value of P corresponding to x2=1 is P,, then 1-2, is ithe
frequency of values of y2 between 0 and 1. Similarly, if the
value of P corresponding to x2=2 is P, then the frequency of
values of x? between 1 and 2 is P,— P,, and so on. Thus, for
16 classes (n”=16), we find in the tables :(—

376
4096 4096
        <pb n="403" />
        SUPPLEMENTS—GOODNESS OF FIT.
Differences of P.
0 1: ‘007 873
. 992 127 172 388
; ‘819 739 368 321
; | "451 418 279 486
‘171 932 *171 932
We should expect, therefore, in, say, 1000 sets of random
sampling with 16 classes, about 8 cases of x* between 0 and 3,
about 172 cases between 5 and 10, 368 between 10 and 15,
279 between 15 and 20, and 172 over 20. The following table
shows the results obtained for the more modest number of 100
sets of trials, and gives very fair agreement with theory, especially
considering that the assumption of normality can hardly be
stritly true. The trials were carried out by throwing 200
beans into a revolving circular tray with sixteen equal radial
compartments, and counting the number of beans in each com-
partment. The value of x2 was then computed, taking the
expected frequency as 200/16 = 125.
Number of Tables giving a Value of x%
lying between the Limits on the Left.
Expected. Observed.
0-5 0-8 x
5-10 17-2 20
10-15 36-8 36
15-20 279 305
20 upwards 17-2 135
If we treat this in its turn as a comparison of observation with
theory, we find, bracketing the first two groups together, so as
to reduce the number of classes to four, y*=1-28, whence from
the tables P is approximately 0°74. That is to say, we should
expect a worse agreement with theory about three times out
of four,

It follows from what was said above that, in any series of trials
by simple sampling, equal numbers of cases should be found within
equal intervals of P, e.g. from 10 to 09, from 0'9 to 08, from
0-8 to 0'7, and so on. The frequency distribution of P, that is to

377
P.
        <pb n="404" />
        HN THEORY OF STATISTICS.
say, when we fulfil the conditions of simple sampling, is uniform
over the whole range from 0 to 1. Thus for a rough grouping
into four classes the above series of trials gave :—
Number of Tables giving a Value of P
lying between the Limits on the Left.
Expected. Observed.
1:00-075 25 23
075-050 25 30
0°50-0°25 25 22
025-0 25 : 25
The value of x? for this comparison is 1:52, giving P=068, or
we should expect a worse fit roughly twice in every three trials.
COMPARISON FREQUENCIES BASED ON THE
OBSERVATIONS.

Contingency Tables.— Attention was specially directed above
to the fact that the theoretical frequencies were assumed to be
given a priori. The theory of the more general case, in which
comparison is made with frequencies determined by the aid of the
observations themselves, has only recently been fully worked out
(Fisher, ref. 76). The most important practical case of the
kind is that of association or contingency tables in which the
observed frequenciescare compared with the independence-values
obtained from the totals of rows and columns—that is, the values

A4,.)(B
(AmBp)y = AmB)
of Chapter V. § 6, p. 64, and in which the differences
Smn= (Am Br) i (Am Bn)
are used as an indication of the divergence from independence.
The rule to which the theory leads is a very simple one: the x?
method is still applicable, but the tables must be entered with 2’
equal to the number of algebraically independent frequencies (or
values of 8) increased by unity, and not with »’ equal to the
number of compartments in the table. Now, if in any column
of the contingency table we are given all the values of 6 but one—
say, the marginal value at the bottom,—the remaining one can be
determined. because the sum of the &amp;'s for every column must be

“78
p
        <pb n="405" />
        SUPPLEMENTS—GOODNESS OF FIT. £7
zero. The same statement must hold good for every row, Hence,
if » be the number of rows, ¢ the number of columus, the number
of algebraically independent values of é is (r—1)(c—1), and the
tables must be entered with the value

w=&gt;r-1)c-1)+1.

The student will realise that this is a reasonable rule if he

considers that when we take n’ as the number of classes, the

comparison frequencies being given a priori, we are taking it as
one more than the number of algebraically independent frequencies,
since the total number of observations is fixed.

The following will serve as an illustration (Yule, ref. 5 of
Chapter V.). Sixteen pieces of photographic paper were printed
down to different depths of colour from nearly white to a very
deep blackish brown. Small scraps were cut from each sheet and
pasted on cards, two scraps on each card one above the other,
combining scraps from the several sheets in all possible ways, so
that there were 256 cards in the pack. Twenty observers then
went through the pack independently, each one naming each tint
either “light,” “medium,” or “dark.”

TaBLE showing the Name (light, medium, or dark) assigned to each of two
Pieces of Photographic Paper on a Card: 256 Cards and 20 Observers.
Upper figure, observed frequency ; central figure, independence frequency ;
bottom figure, difference 8. (Yule, ref. 5 of Chap. V., Table XXI.)

. Name assigned to Upper Tint on Card.
Name assigned to
Lower Tint on Total.
Card, Light. Medium. Dark.
850 571 580 aang
Light 785 633 583
+65 - 62 - 3
618 503 455 So
Medium . 653 527 486
—- 35 +66 - 31
540 456 457 RE]
Dark 570 460 423
—-30 - 4 +34
Tot? ro g

377
“OU;
1660
14D¢
aL 2008 1620 149% 512¢
        <pb n="406" />
        380)

THEORY OF STATISTICS.
4295/7350 0, 5°33
3844/633 . .  . 6°07
9/583 02
1225/653 ! T:83
4356/527 . 827
961/486 ea 1:98
900/570 . 158
16/460 yy. 03
1156/423 273
Total x2 27°94
n' 5

iP 000012

The results are shown in the preceding table, the upper figure in
each compartment of the table being the observed frequency of
the corresponding pair of names. Below the observed frequency
are given the independence frequency (4,,B,), and the difference
dmn. It will be seen that the observed figures are not very close
to the independence-values, there being apparently a marked
tendency to give the same names to the two tints on any card, so
that all the diagonal frequencies are in excess of the independence-
values and all the others in defect.

Working out x2 as shown, the total comes to 27-94, or practically
28. Since r and ¢ are both 3, #»’ must be taken as (2 x 2) +1—
that is, 5. Turning up the tables in the column »’=&gt;5, we find
P=-000012—that is to say, we would only expect to find so great
a divergence from independence, in random sampling, a little
more than once in 100,000 trials, so the result is certainly
significant.

Association Tables.— When we are dealing with an association
table there are only two rows and two columns, and consequently
n’ must be taken as (2—1)(2—1)+ 1—that is, 2. But no column
for n’ = 2 is given in Tables for Statisticians and Biometricians, the
lowest value taken being n’= 3, and a supplementary table (XV. c)
is not sufficiently detailed: the necessary table, reprinted by
permission from the Journal of the Royal Statistical Society
(ref. 77), will be found at the end of this Supplement. As will
be seen from the following illustrations, the required probability
can also be determined from the table of areas of the normal
curve, but it is very convenient to keep the arithmetic in the
usual form.

Example i.— (Data from Chapter III, p. 37.) The following
data are there cited for colour of flower and prickliness of fruit in
Datura: the independence-frequencies have been entered below
the numbers of observations.
        <pb n="407" />
        SUPPLEMENTS—GOODNESS OF FIT.
Fev +,
Flower, Total.
Prickly. Smooth.
: : 47 12 £)

Violet 48-337 10-663

hi 21 3 “A

Shite 19-663 4337

Total . :

Here 6 is 1-337, and
1 1 1 1
2 (1-337 | am Gees roses 1337)
X= (357) feamr + 10063 T9603 4337

=-708.
Turning up this value of x? in the table on p. 385, we find by
interpolation P=-400. As stated in the text, the association,
negative in this case, is “so small that no stress can be laid on it
as indicating anytLing but a fluctuation of sampling.”

Precisely the same result can be arrived at by working out the
standard error of the difference between the proportions of violet
and of white flowers that have smooth fruits, taking the ratio of
the difference to its standard error and then using the table of
areas of the normal curve. Thus:—

Proportion of violet flowers that have smooth

fruits, 12/59 or . "2033
Proportion of white flowers that have smooth

fruits, 3/24 or : . "1250
Difference . 0783
Proportion of all flowers that have smooth fruits,

15/83 or "1807
Standard error of the difference between proportions of smooth
fruits in sampling from a universe in which the proportions are
‘1807 and '8193, and the numbers in the samples 59 and 24
respectively :—

bt.)
8193 x 1807 93) = 0932.
V X 59 Tog
Hence the ratio of the observed difference to its standard error is
‘0783/:0932 or ‘840.

381
i auit
at
24
68 15 83
        <pb n="408" />
        THEORY OF STATISTICS.

Interpolating in the table of areas of the normal curve on
p. 310, or taking the required figure directly from Table II. of
Tables for Statisticians, we have: —

Greater fraction of area for a deviation of ‘84 in

the normal curve . . : . 7990
Area in the tail . : : g 452005
Area in both tails 401

That is to say, the probability of getting a difference, of either
sign, as great as or greater than that actually observed is "401,
agreeing, within the accuracy of ‘the arithmetic, with the
probability given by the x* method.

The same result would again have been obtained had we worked
from the columns instead of from the rows, and considered the
difference between the proportions of white flowers for prickly and
for smooth fruits respectively.

Example ii.—(Data from ref. 6 of Chapter III, Table XIV.)
The following table shows the result of inoculation against cholera
on a certain tea estate :—

Not-attacked. Attacked. Total.

Inoculated . Wie Be 2k

: AL 9 200

Not-inoculated 2943 Sr

Total . : 2 ‘ 3

As in the last example, the independence-frequencies have been
given below the numbers observed. The value of 8 is 3:3, and

1 1 1 EB)
2 =(3" A B — ee —— = i
X= rr + 53 mars 51) ~ 32
From the table on p. 386 P is “0706.

Working from the proportions attacked, we can arrive at the
same result.

Proportion attacked amongst inoculated . . 01147

&gt; ys i not-inoculated . ‘03000
Difference . 01853
The standard error of the difference is
Fos
98098 x 01902( 1 200) 01025.
\ = 36 7300 ?

382
DM
1a
73¢
        <pb n="409" />
        SUPPLEMENTS—GOODNESS OF FIT. ;
The ratio of the difference to its standard error is therefore
‘01853/-01025, or 1-808.
Greater fraction of normal curve for a deviation of 1-808 is 96470
Fraction in tail , . . "03530
Fraction in the two tails . 07060

As before, both methods must lead to the same result.

An Aggregate of Tables.—It may often happen that we have
formed a number of contingency or association tables—more
often the latter than the former—for similar data from different
fields. All may give, perhaps, a positive association, but the
values of P may run so high that we do not feel any great con-
fidence even in the aggregate result. The question then arises
whether we cannot obtain a single value of P for the aggregate as
a whole, telling us what is the probability of getting by mere
random sampling a series of divergences from independence as
great as or greater than those observed. The question is usually
answered by pooling the tables; but, in view of the fallacies that
may be introduced by pooling (¢f. Chapter IV. §§ 6 and 7), this
method is not quite satisfactory. A better answer is given by the
application of the present general rule. Add up all the values of
x* for the different tables, thus obtaining the value of x for the
aggregate, and enter the P-tables with a value of #’ equal to the
total of algebraically independent frequencies increased by unity :
that is, take n” as given by

n'=1+3(r-1)(c-1).

For the association table there 1s only one algebraically inde-
pendent value of 8. Hence if we are testing the divergence from
independence of an aggregate of association tables, we must add
together the values of x2 and enter the P-tables with #’ taken as
one more than the number of tables in the aggregate.

Thus from ref. 6 of Chapter IIL, from which the data of
Example ii. were cited, we take the following values of x? and of
P for six tables that include that example. They refer to six
different estates in the same group.

P
3 4 0022
6-08 014
251 11
3-27 071
561 018
1-59 21

Total 28°40

382
        <pb n="410" />
        LE THEORY OF STATISTICS.

The association between inoculation and protection from attack
is positive for each estate, but for only one of the tables is the
value of P so small that we can say the result is wery unlikely to
have arisen as a fluctuation of sampling. Adding up the values
of x2, the total is 28-40, and entering the column for n'=7 (one
more than the number of tables considered), we find

P 0

23 000094

29 000061
whence by interpolation the value of P is ‘000081, i.e. we should
only expect to get a total of x%s as great as or greater than this, on
random sampling, 81 times in 1,000,000 trials. We can therefore
regard the results as significant with a high degree of confidence.

We may, I think, go further: for all the observed associations
are positive, and in six cases there are 2% or 64 possible permuta-
tions of sign. We should therefore only expect to get an equal
or greater total value of x2 and tables all showing positive associa-
tion, not 81 times in 1,000,000 trials but 81/64 or, roundly, 1-3
times. P for the observed event (3(x?)=28'4 and all associations
positive) is therefore only ‘0000013.

Experimental Illustrations of the General Case.—The formule
for the general case, as for the special case in which the frequencies
with which comparison is made are given a prior, can be checked
by experiment.

The numbers of beans counted in each of the sixteen compart-
ments of the revolving circular tray mentioned on p. 374 above
were entered as the frequencies of a table (1) with 4 rows and
4 columns, (2) with 2 rows and 8 columns, and the value of x?
computed for each table for divergence from independence. For
the two cases we have

w=(3x3)+1=10

and n'=(1x7)+1=38

respectively. Differencing the columns for P corresponding to
these two values of n’, we obtain the theoretical frequency-distri-
butions given in the columns headed “Expectation” in Table A,
The observed distributions of the values of x? in 100 experimental
tables are given in the columns headed “ Observation.” It will be
seen that the agreement between expectation and observation is
excellent for so small a number of observations. If the goodness
of fit be tested by the x2 method, grouping together the frequencies
from x2=15 upwards, so that n’ is 4, x* is found to be 2-27 for
the 4 x 4 tables and 4:36 for the 2 x 8 tables, giving P=052 in
the first case and 0:22 in the second.

184
        <pb n="411" />
        SUPPLEMENTS—GOODNESS OF FIT.

TABLE A. — Theoretical Distribution of x2, calculated from Independence-values,
in Tables with 16 Compartments, compared with the Actual Distributions
given by 100 Experimental Tables. In the first case n’ must be taken as
10, 2n the second as 8. (Ref. 77.)

4 Rows, 4 Columns, 2 Rows, 8 Columns,
Expectation. Observation. Expectation. Observation.
0-5 16°6 17 34:0 295
5-10 484 44 47-1 565
10-15 260 32 153 10
15-20 7-3 8 30 3
20- 18 : 06 1
Total 100 1
For tables with 2 rows and 2 columns 350 experimental tables of

100 observations each were available. The observed distribution of

values of x2, calculated from the independence-frequencies, is shown

in Table B, together with the theoretical distribution obtained by
differencing the table on pp. 385-386. Testing goodness of fit on

Table B as it stands, »” is 10, x* works out at 7-53, and P is 0583.

TABLE B.-—Theoretical Distribution of x2 for a Table with 2 Rows and 2
Columns, when x* is calculated from the Independence-values, compared
with the Actual Results for 350 Experimental Tables. (Ref. 77.)

Number of Tables.
Value of x%
Expected. Observed.
0 -025 13402 122
025-050 48°15 54
050-075 32°56 41
0-75-1-00 24.21 24
1 -2 56:00 62
2 = 25°91 18
3 -4 13°22 13
4 -5 7:05
5 -6 3°86
6— 501
Tot=1 34 '~)

385
i 1001 100 avy 100
Ca «499 350
        <pb n="412" />
        : THEORY OF STATISTICS.

The theorem last given for evaluating P for an aggregate of
tables is illustrated by the experimental data of Tables C and D.
The values of x2 for the 350 fourfold tables of Table B were
added together in pairs, giving 175 pairs. According to theory
the resulting frequency-distribution for the totals of pairs of x's
should be given by differencing the column of the P-table for
n’=3. The results of theory and observation are compared in
the first pair of columns of Table C. Testing goodness of fit,
grouping the values of x2 7 and upwards, 2" is 8, x? is 5-53, and
P is 0:60.

Grouping the values of x? for the 350 experimental tables
similarly in sets of three and summing, we get the observed
distribution on the right of Table C, and the theoretical distribu-
tion by differencing the column of the P-table for n»’=4.
Grouping values of x? 8 and upwards, and testing goodness of fit
between theory and observation, »’ is 9, x2 is 2°18, and P 0-97.
TABLE C.— Theoretical Distribution of Totals of x? (calculated from Independ-

ence-values) for Pairs and for Sets of Three Tables with 2 Rows and 2
Columns, compared with the Actual Distributions given by Experimental
Tables. mn’ must be taken as 3 in the first case, and 4 in the second.

Pairs of Tables. Sets of 3 Tables.

Sum of

x°’s.

Expectation. Observation. Expectation. Observation.
0-1 68-9 67 23°1 21
1-2 41-8 46 26°5 26
2-3 25°3 22 21°0 | 22
3-4 15-4 19 15-1 19
4-5 93 10-4
5-6 56 7°0
6-7 34 46
7-8 2:1 : 30 '
8— 3:2 : 5-3

Total 2 9 q

Table D makes a similar comparison for the values of x?
calculated from independence, for 100 pairs of 4 x4 tables.
Here there are 9 algebraically independent &amp;’s for each table of
the pair, and consequently n’ must be taken as 19. Differencing
the P-table for n’ =19, the expected distribution is obtained, which
is shown in the first column of Table D, the observed distribution

386
175-0 175 116°¢ 11%
        <pb n="413" />
        SUPPLEMENTS—GOODNESS OF FIT. {
being given in the second column. Taking the two groups at the
bottom of the table together and testing goodness of fit, x? is
found to be 4°11, »” is 5, and P is 0-39.

TABLE D.— Theoretical Distribution of Totals of x2 (calculated from Independ-
ence-values) for Pairs of Tables with 4 Rows and 4 Columns, compared with
the Actual Distribution given by Experimental Tables.

fant two Expectation. ~~ Observation.
0-10 68 8
10-15 270 27
15-20 329 31
20-25 20-8 27
25-30 88
30- 37

Total .

The general theorem that n” must be taken equal to the number
of algebraically independent frequencies increased by unity applies
not only to association and contingency tables, but to all cases in
which the frequencies observed are connected with those expected
by a number of linear relations, beyond their restriction to the
same total frequency (Fisher, ref. 76). Thus, if a frequency
curve has been fitted by the mean and standard deviation, n’
should be taken as 2 less than the number of classes: if it has
been fitted by the first four moments, n’ should be taken as four
less than the number of classes,

38"
100-0 100
        <pb n="414" />
        388

THEORY OF STATISTICS.
Table of the Values of P for Divergence from Independence in the
Fourfold Table.
A.—x2=0 to ¥2=1 by steps of 0:01.

r A 2 P A
0 1:00000 7966 0:50 047950 436
0°01 092034 3280 051 0°47514 430
002 ' 088754 | 2505 0°52 047084 4923
0:03 0-86249 2101 0:53 046661 418
0°04 0°84148 1842 0°54 046243 411
0°05 0°82306 1656 055 0°45832 406
0°06 080650 1516 0°56 045426 400
0:07 0°79134 1404 0°57 0°45026 395
0:08 0°77730 1312 0°58 044631 389
0°09 076418 1235 0°59 0°44242 384
0-10 0°75183 1169 0°60 043858 379
0°11 0°74014 1111 0°61 043479 374
0°12 072903 1060 0:62 043105 369
0°13 071843 1015 0°63 I 042736 365
0°14 0-70828 974 0°64 0°42371 360
0°15 069854 938 9°65 042011 355
0°16 068916 905 0°66 041656 351
0°17 0:68011 874 J:67 0°41305 346
0°18 0°67137 845 0°68 0°40959 343
0°19 066292 820 " 0°69 040616 338
0°20 065472 795 0-70 0°40278 334
0°21 0:64677 773 0:71 0°39944 330
0°22 063904 752 0-72 0°39614 326
0°23 063152 731 0-73 0°39288 322
0°24 062421 713 0:74 038966 318
0°25 0°61708 696 0-75 0°38648 315
0°26 0°61012 679 0-76 0°38333 311
0°27 060333 663 0°77 0°38022 308
0°28 059670 648 0-78 037714 304
0°29 059022 634 v'79 0°37410 301
0-30 058388 620 J'80 0°37109 297
0:31 057768 607 0-81 036812 294
0°32 0 57161 595 0°82 0°36518 291
0-33 0°56566 583 0°83 0:36227 287
0-34 055983 572 0-84 035940 285
0-35 0°55411 560 0-85 035655 281
0°36 0°54851 551 0°86 035374 278
0-87 054300 540 J:87 0°35096 276
0°38 053760 530 )-88 0°34820 272
0°39 053230 521 0-89 0°34548 270
0-40 0-52709 512 0-90 0°34278 267
0°41 052197 503 0-91 0°34011 264
0-42 0-51694 495 0:92 033747 261
0°43 0°51199 487 0°93 0°33486 258
0°44 0°50712 479 0°94 033228 256
0°45 0:50233 471 0:95 032972 253
0°46 049762 463 0-96 0°32719 251
0°47 0°49299 457 0-97 032468 248
0-48 0°48842 449 0°98 0 32220 246
0°49 0°48393 443 0°99 0°31974 243
0:50 047950 436 1:00 0:'81781 241
        <pb n="415" />
        SUPPLEMENTS—GOODNESS OF FIT.
B.—x®=1 to x2=10 by steps of 01.

3 r a &gt; p A
10 0-31731 2304 55 0°01902 106
11 0-29427 2095 56 0°01796 99
1-2 027332 1911 57 0°01697 94
13 0-25421 1749 58 0°01603 89
14 0-23672 1605 5-9 001514 &amp;
1 22067 1477 HY 0-01431 i
16 0-20590 1361 v1! 0-01352 :
1-7 2'19229 1258 “2 0°01278 ‘
18 0-17971 1163 oy 001207 f
19 0-16808 1078 Ld 0-01141

20 015730 1000 ( 0°01079 :
2°1 014730 929 GL 001020 .
2:2 0-13801 864 8 000964

23 0°12937 803 £3 0°00912 £
“4 0-12134 749 L 0°00862 4
wh 0-11385 699 iJ 000815 4:
4 010686 651 i 0°00771 42
’ 0-10035 609 ¥ 000729 39

0-09426 568 i 0°00690 38

: 0-08858 532 : 0°00652 35
L 0°08326 497 i 000617 &lt;
H 007829 465 i 000584 ‘
Srv 0-07364 436 : 0°00552
“ 5 0-06928 408 : 000522 ¢
od 0°06520 383 7 0°00494 2.
Co 0-06137 359 8 000468 2
. 005778 337 8 0°00443 oe
: 0-05441 316 8-1 0°00419 2;
: 0-05125 296 E 3 0°00396 7
5-1 0-04829 279 &amp; 0'00375 %)
4-0 0-04550 262 f 0°00355 i
4-1 004288 246 0°00336 ]
42 004042 231 . 0-00318 1
43 0-03811 217 000301 1
44 0-0359+4 205 . 0:00285 1
45 0-03389 192 9%; 000270 1:
46 0 03197 181 9-1 0-00256 Li
4-7 003016 170 9-2 0-00242 13
{8 0-02846 160 a-3 0-00229 12
4" 0 02686 151 “1 000217 1}
b 0-02535 142 Le 0+00205 1)
4 0-02393 134 9-6 0-00195 .
52 0-02259 126 : 9-7 0-00184 1.
53 002133 119 ¢-8 0-00174

54 0-02014 112 9-9 0-00165

55 0-01902 106 10:0 0°00157

For values of P corresponding to x2=11 to x*=30, by units, see Table XV. (c),

p- 30 of Tables for Statisticians and Biometricians.

389
        <pb n="416" />
        THEORY OF STATISTICS.
ADDITIONAL REFERENCES.
History of Official Statistics (p. 6).

(1) Koren, J. (edited by), The History of Statistics, their Development and
Progress in many Countries, New York, The Macmillan Co., 1918.
(A collection of articles, mainly on the progress of official statistics,
written by a specialist for each country.)

Contingency (p. 73).

(2) PrarsoN, KARL, “On the Measurement of the Influence of Broad
Categories on Correlation,” Biometrika, vol. ix., 1913, p. 116.

(3) PEARSON, KARL, “On the General Theory of Multiple Contingency with
Special Reference to Partial Contingency,” Biometrika, vol. xi., 1916,
P. 145. (An extension of the method of contingency coefficients to
classification subjected to various conditions; arithmetical examples
are provided in the undermentioned paper.)

(4) PEARSON, KARL, and J. F. TocHER, “ On Criteria for the Existence of
Differential Death-Rates,” Biometrika, vol. xi., 1916, p. 159.

(5) RrrcHIE-ScorT, A., “The Correlation Coefficient of a Polychoric Table,”
Biometrika, vol. xii., 1918, p. 93. (Considers various methods of
measuring association with special reference to 4 x 3-fold classificatious.)

(6) PEARsoN, KARL, and E. 8. PEArsoN, On Polychoric Coefficients of
Correlation,” Biometrika, vol. xiv., 1922, p. 127.

The Mode (p. 130).

(7) DoobsoN, ARTHUR T., “Relation of the Mode, Median and Mean, in
Frequency Curves,” Biometrika, vol. xi., 1916-17, p. 429. (Gives a
proof of the relation noted on p. 121.)

Index-numbers (p. 130).
There are useful discussions as to method in the following :—

(8) KniBs, G. H., ‘“ Prices, Price-Indexes, and Cost of Living in Australia,”
Commonwealth of Australia, Labowr and Industrial Branch, Report
No. 1, 1912.

(9) Woop, Frances, ‘‘ The Course of Real Wages in London, 1900-12,”
Jour. Roy. Stat. Soc., vol. 1xxvii., 1918-14, p. 1.

(10) WorkiNG Crasses, Cost oF LiviNe COMMITTEE, 1918, Report (Cd.
8980, 1918), H.M. Stationery Office.

(11) BowLEY, A. L., ‘“ The Measurement of Changes in Cost of Living,”
Jour. Roy. Stat. Soc., vol. 1xxxii., 1919, p. 343.

(12) BENNETT, T. L/, ¢‘ The Theory of Measurement of Changes in the Cost
of Living,” Jour. Roy. Stat. Soc., vol. Ixxxiii., 1920, p. 455.

(18) Frux, A. W., “The Measurement of Price Changes,” Jour. Roy. Stat.
Sve., vol. \lxxziv., 1821 tp. 167.

(14) FisHER, IRVING, ‘The Best Form of Index-number,” Quart. Pub.
Amer. Stat. Ass., March 1921, p. 533.

(15) Persons, W. M., “Fisher’s Formula for Index-numbers,” Rev. Econ.
Statistics, vol, iii., 1921, p. 108.

(16) MArcH, L., ‘Les modes de mesure du mouvement général des prix,”
Metron, vol. i., No. 4, 1921, p. 40.

7~ (17) F1sHER, IRVING, The Making of Index-numbers, Houghton Mifflin Co.,

Boston and New York, 1922. (Usefui as a repertory of formule, with
tests of the results given on certain American data; otherwise, cf.
reviews in Fconomic Journal, vol. xxxiii., p. 90 and p. 246, and
Jour. Roy. Stat. Soc., vol. lxxxvi., p. 424.)

39()
        <pb n="417" />
        SUPPLEMENTS—ADDITIONAL REFERENCES. 391

(18) MARSHALL, A., Money, Credit, and Commerce, Macmillan, London, 1923.

For the student of the cost of living in Great Britain the following
are useful : —

(19) *“ Labour Gazette Index Number: Scope and Method of Compilation,”
Lab. Gaz., March 1920 and Feb. 1921.

(20) ** Final Report on the Cost of Living of the Parliamentary Committee
of the Trades Union Congress” (The Committee, 32 Eccleston Sq.,
London, 1921) ; critical notices of the same in the Labour Gazette, Aug.
and Sept. 1921 ; and review by A. L. Bowley, Econ. Jour., Sept. 1921.

(21) BowLEy, A. L., Prices and Wages in the United Kingdom, 1914-20,
Oxford, 1920 (Clarendon Press).

(22) MarcH, L., ‘Rapport sur les indices de la situation économique,”
Bulletin de I’ Institut International de Statistique, t. xxi., pt. 2, p. 3.

(23) Gxt, C., ‘“‘Quelques considérations au sujet de la construction des
nombres indices des prix, ete.,” Metron, vol. iv., 1924, p. 3.

(24) EpceworrH, F. Y., “The Plurality of Index Numbers,” Economic
Journal, vol. xxxv., 1925, p. 379.

(25) EpcEworTH, F. Y., “The Element of Probability in Index Numbers,”
Jour. Roy. Stat. Soc., vol. Ixxxviii., 1925, p. 557.

(26) BowLEY, A. L., ‘‘The Influence on the Precision of Index Numbers of
the Correlation between the Prices of Commodities,” Jour. Roy. Stat.
Soe., vol, 1xxxix., 1926, p. 300.

Correlation : History (p. 188).

(27) Pearson, K., “Notes on the History of Correlation,” Biometrika, vol.
xiii., 1920, p. 25.

Fit of Regression Lines (p. 209).

(28) PEARsox, Karr, “On the Application of Goodness of Fit Tables to test
Regression Curves and Theoretical Curves used to describe Observa-
tional or Experimental Data,” Biometrika, vol. xi., 1916-17, p. 237.
(Criticises and extends the work of Slutsky.)

(29) FisuER, R. A., ‘“The Goodness of Fit of Regression Formule, and the
Distribution of Regression Coefficients,” Jour. Roy. Stat. Soc., vol.
Ixxxv., 1922, vp. 597.

Correlation in Case of Non-linear Regression (p. 209).

(80) WicksELL, S. D., “On Logarithmic Correlation, with an Application to
the Distribution of Ages at First Marriage,” Meddelande fran Lunds
Astronomiska Observatorium, No. 84, 1917. Svenska Aktuarie-
forenings Tidskrift.

(81) WickseLL, S. D., “The Correlation Function of Type A,” Kungl.
Svenska Vetenskapsakademiens Handl., Bd. lviii., 1917.

(32) Pearson, K., “Ou a General Method of Determining the- Successive
Terms in a Skew Regression Line,” Biometrika, vol. xiii., 1921, p. 296.

(38) Pearsox, KArr, “On the Correction necessary for the Correlation
Ratio »,” Biometrika, vol. xiv., 1923, p. 412.

Correlation : Effect of Errors of Observation, etc. (p. 225).

(34) Harr, BERNARD, and C. SPEARMAN, ‘General Ability, its Existence
and Nature,” Brit. Jour. Psychology, vol. v.,.1912, p. 51.

There has been a good deal of controversy about these formule and
their applications in psychological work : ¢f. (119) Brown and Thomson,
and the references there given, critical notice of the same in Brit. Jour.
Psych. , vol. xii., 1921, p. 100, and—
        <pb n="418" />
        Lo THEORY OF STATISTICS.
(35) StEAD, H. G., “The Correction of Correlation Coefficients,” Jour. Roy.
Stat. Soc., vol. Ixxxvi., 1923, p. 412.
Standardisation or Correction of Death-rates (p. 226).
For the methods of standardisation in present use in England and
Wales see—
(36) Seventy-fourth Annual Report of the Registrar-General of Births, Deaths,
and Marriages in England and Wales (1911). [Cd. 6578, 1913.]
Reference may also be made to—

(37) WorreNDEN, H. H., “On the Methods of comparing the Mortalities of
Two or More Communities, and the Standardisation of Death-rates,”
Jour. Roy. Stat. Soc., vol. 1xxxviii., 1923, p- 399.

Correlation : Time-problem (p. 208) and Miscellaneous (p. 226).

(38) HARRIS, J. ARTHUR, ‘‘The Correlation between a Component, and
between the Sum of Two or More Components, and the Sum of the
Remaining Components of a Variable,” Quart. Pub. American Stat.
4ss., vol. xv., 1917, p. 854.

(39) YULE, G. U., “On the Time-correlation Problem,” Jour. Roy. Stat. Soc.,
vol. xxxiv., 1921, p. 497.

(40) WicksELL, S. D., ‘“ An Exact Formula for Spurious Correlation,” Metron,
vol. i., No, 4, 1921, p. 33.

(41) PearsoN, KARL, and E. M. ELDERTON, ‘On the Variate Difference
Method,” Biometrika, vol. xiv., 1928, p- 281.

(42) ANDERSON, O,, ‘‘Ueber ein neues Verfahren bei Anwendung der
‘ Variate-Difference’ Methode,” Biometrika, vol. xv., 1923, p. 134.

(43) YuLg, G. U., “ Why do we sometimes get Nonsense Correlations
between Time-Series?! A Study in Sampling and the Nature of
Time-Series,” Jour. Roy. Stat. Soc., vol. Ixxxix., 1926, p. 1.

(44) ANDERSON, O., ‘“ Ueber die Anwendung der Differenzenmethode
(Variate Dillerence Method) bei Reihenausgleichungen, Stabilitits-
untersuchungen, und Korrelationsmessungen,” Biometrika, vol. Xviti,,
1926, p. 293.

Partial Correlation and Partial Correlation Ratio (p- 252).

(45) Kerrey, T. L., “Tables to facilitate the Calculation of Partial Coeffi-
cients of Correlation and Regression Equations,” Bulletin of the
University of Texas, No. 27, 1916. (Tables giving the values of
1 Ja -ri3)(1 —~ 755) and 7y50y,/ A (1 —153)(1 = 753).

(46) PEARSON, KARL, “On the Partial Correlation Ratio,” Proc. Roy. Soc.,
Series A, vol. xci., 1915, p. 492.

(47) IssErLis, L., “On the Partial Correlation Ratio ; Part ii,, Numerical,”
Biometrika, vol. xi, 1916-17, p- 50.

(48) MINER, J. R., Tables of \/1—12 and 1 = 12 for use in partial Correlation,
etc., The Johns Hopkins Press, Baltimore, 1922. (Six-figure tables.)

(49) Kerrey, FOL. and ES, SALISBURY, ‘“ An Iteration Method for
determining Multiple Correlation Constants,” Jour. Amer. Stat. Assoc.
vol. xxi., 1926, p..282.

Sampling of Attributes (p. 273).
(50) DETLEFSEN, J. A., ¢ Fluctuations of Sampling in a Mendelian Popula-
tion,” Genetics, vol. iii., 1918, p. 599.

309
        <pb n="419" />
        SUPPLEMENTS—ADDITIONAL REFERENCES. :

(51) RHODES, E. C., “On the Problem whether two given Samples can be
supposed to have been drawn from the same Population,” Ziometrika,
vol. xvi., 1924, p. 239, and Metron, vol. v., 1925, D3.

(62) PEARSON, KARL, “On the Difference and the Doublet Tests for Ascertain-
ing whether Two Samples have been drawn from the same Population,”
Biometrika, vol. xvi., 1924, p- 249.

The Law of Small Chances (p- 273).

(53) BorTKIEWICZ, L. VON, ‘‘Realismus und Formalismus in der mathe-
matischer Statistik,” Allgemein. Stat. Arch. vol. ix., 1916, p. 225.
(Continues the discussion initiated by the paper of Miss Whitaker,
cited on p. 273.)

(54) GrEENWOOD, M., and G. UpNY YULE, “On the Statistical Interpreta-
tion of some Bacteriological Methods employed in Water Analysis,”
Journal of Hygiene, vol. xvi, 1917, p. 36. (Applies a criterion
developed from Poisson’s limit to the discrimination of water analyses ;
numerous arithmetical examples.)

(55) “‘ STUDENT,” ‘‘ An Explanation of Deviations from Poisson’s Law in
Practice,” Biometrika, vol. x., 1919, p. 211.

(56) BorTKIEWICZ, L. voN, ‘Ueber die Zeitfolge Zufilliger Ereignisse,”
Bull. de U Institut Int. de Stat., tome XXx., 2¢ livr., 1915.

(57) MoraNT, G., “On Random Occurrences in Space and Time when
followed by a Closed Interval,” Biometrika, vol. Xiii., 1921, p. 309.

See also references 73.
Frequency Curves (p. 314).

(58) PrarsoN, Kary, ‘“Second Supplement to a Memoir on Skew Variation,”
Phil. Trans. Roy. Soc., Series A, vol. cexvi., 1916, p. 429. (Completes
the description of type frequency curves contained in references (1) and
(3) of p. 105,)

The advanced student who desires to compare the merits of different
frequency systems proposed, should consult the two following : —

(59) CHARLIER, C. V. L., Numerous papers issued from the Astronomical
Department of Lund, 1906-12, especially ““ Contributions to the
Mathematical Theory of Statistics ” (1912).

(60) EnceEwortn, F. Y., “On the Mathematical Representation of Statisti-
cal Data,” Jour. Roy. Stat. Soc., vol. Ixxix., 1916, p. 456 ; Ixxx. PP.
65, 266, 411 ; lxxxi., 1918, p. 322,

(61) SorEr, H. E., Frequency Arrays, Cambridge University Press, 1922,

(62) EpcEwortn, F. Y., “ Untried Methods of Representing Frequency,”
Jour. Roy. Stat. Soc., vol. Ixxxvii, 1924, p. 571.

(63) RomaNovsky, V., “Generalisation of some Types of the Frequency
Curves of Professor Pearson,” Biometrika, vol. xvi., 1924, p. 106.

(64) PEarsoN, KaRrL, Historical Note on the Origin of the Normal Curve
of Errors,” Biometrika, vol, xvi, 1924, p. 402.

(65) RHODES, E. C., “On the Generalised Law of Error,” Jour. Roy. Stat.
Soc., vol. Ixxxviii., 1925, p. 576.

(66) EnceEworrH, F. Y., “Mr Rhodes’s Curve and the Method of Adjustment,”
Jour. Roy. Stat. Soc., vol. Ixxxix., 1926, p. 129,

(These papers are concerned with the general theory of frequency
systems ; the undermentioned deal with the forms which are suitable
for the representation of particular classes of data, especially statistics
of epidemic disease.)

393
        <pb n="420" />
        Si THEORY OF STATISTICS.

(67) BRowNLEE, J., ‘‘The Mathematical Theory of Random Migration and
Epidemic Distribution,” roc. Roy. Soc. Edin., vol. xxxi., 1910-11,
p. 262.

(68) BROWNLEE, J., * Certain Aspects of the Theory of Epidemiology in
Special Reference to Plague,” Proc. Roy. Soc. Medicine, Sect. Epi-
demiolog yand State Medicine, vol. x. D, 1918, p. 85. (The appendix
to this paper summarises the author’s results and those of Sir Ronald
Ross ; vide infro.)

(69) Ross, Sir RoxaLp, ‘An Application of the Theory of Probabilities to
the Study of a priori Pathometry,” Proc. Roy. Soc., A, vol. xcii.,
1916, p. 204.

(70) Ross, Sir RoxaLp, and HiLpa P. Hupson, ‘An Application of the
Theory of Probabilities to the Study of priors Pathometry,” Pts. II.
and IiI., Proc. Roy. Soc., A, vol. xciil., 1917, pp. 212 and 225.

(71) Kxine, i&amp;. H., “The Mathematical Theory of Population,” Appendix A
to vol. i. of Census of the Commonwealth of Australia. (Contains a
full discussion of the application of various frequency systems to vital
statistics.)

(72) Moir, H., ¢‘ Mortality Graphs,” Trams. Actuarial Soc. America, vol.
xviii, 1917, p. 311. (Numerous graphs of mortality rates in different
classes and periods.)

(73) GrEENWOOD, M., and G. U. YUuLg, ‘‘ An Enquiry into the Nature of
Frequency Distributions representative of Multiple Happenings, with
particular reference to the Occurrence of Multiple Attacks of Disease
or of Repeated Accidents,” Journ. Roy. Stat. Soc., vol, 1xxxiii., 1920,
p- 255.

Goodness of Fit (p. 315 and p. 370).

(74) Pearson, Kary, On a Brief Proof of the Fundamental Formula for
testing the Goodness of Fit of Frequency Distributions and on the
Probable Error of P,” Phil. Mag., vol. xxx. D (6th ser.) 1916, p. 369.

(75) PEARrsoN, Karp, Multiple Cases of Disease in the same House,”
Biometrika, vol. 1x., 1913, p. 28. (A modification of the goodness-of-

fit test to cover such statistics as those indicated by the title.)

(76) Fisuer, R. A., “On the Interpretation of x? from Contingency Tables,

* and the Calculation of P,” Jour. Roy. Stat. Soc., vol. 1xxxv., 1922,
p- 87.

(77) YuLE, G. U., ‘On the Application of the x? Method to Association and
Contingency Tables, with experimental illustrations,” Jour. Roy. Stat.
Soc., vol. 1xxxv., 1922, p. 95. After correspondence with Mr Fisher
I wish to withdraw the statement on p. 97 of this paper, that a full
proof [of the general theorem as applied to contingency tables] seems
still to be lacking: he has convinced me that his proof covers the case.

The three following are controversy on the two preceding papers i—

(78) PEARSON, KARL, «On the x2 Test of Goodness of Fit,” Biometrika,
vol, xiv., 1922, p. 186; and Further Note,” ¢bid., p. 418.

(79) BowLEY, A. L., and R. L. CoxNoOg, * Tests of Correspondence between
Statistical Grouping and F ormule,” Economica, 1923, p. 1. ”

(80) FisuER, R. A., Statistical Tests of Agreement between Observation
and Hypothesis” (with a note in reply by A. L. Bowley), Economica,
1923, p. 139. "
(81) Fisuer, R. A., “The Conditions under which x? measures the dis-
crepancy between Observation and Hypothesis,” Jour. Roy. Stat. Soc.,
vol. lxxxvii., 1924, p. 442.
Qee also references 28 and 29.

204.
        <pb n="421" />
        SUPPLEMENTS—ADDITIONAL REFERENCES.

Probable Errors, etc. : General References (p- 355).

(82) IsserLts, L., “On the Value of a Mean as calculated from a Sample,”
Jour. Roy. Stat. Soc., vol. Ixxxi., 1918, p. 75.

(83) Sorrr, H. E., and Others, ““On the Distribution of the Correlation
Coeffivient in Small Samples,” Biometrika, vol. xi., 1916-17, p. 328,

(84) PrarsoN, Karr, *“On the Probable Error of Biserial »,” Biometrika,
vol. xi., 1916-17, p. 292.

(85) Young, AxprEW, and KARL PEARSON, *“On the Probable Error of a
Coefficient of Contingency without Approximation,” Biometrika, vol.
xi., 1916-17, p. 215.

(86) Editorial, “On the Probable Errors of Frequency Constants,” Pt. IIL,
Biometrika, vol. xiii., 1920, p 1183.

(87) ““SrupENT,” *“ An Experimental Determination of the Probable Error of
Dr Spearman’s Correlation Coefficients,” Biometrika, vol. xiii., 1921,
. 263.

(88) Bhan, J. W., © An Experimental Determination of the Distribution
of the Partial Correlation Coefficient in Samples of Thirty,” Proc. Roy.
Soe., A, vol. xevii., 1920, and Metron, vol. ii., 1923, p. 684.

(89) Tscauprow, A. A., *“ On the Mathematical Expectation of the
Moments of Frequency Distributions,” Biometrika, vol. xii., 1918-19,
pp. 140 and 185, and vol. xiii., 1921, p. 283; and Metron, vol, ii.,
1923, pp. 461 and 646.

(90) F1sHER, R. A., “On the Probable Error of a Coefficient of Correlation
deduced from a Small Sample,” Metron, vol. i., No. 4, 1921, P3-

(91) FisHkr, R. A., “On the Mathematical Foundations of Theoretical
Statistics,” Phil. Trans., A, vol, cexxii., 1922, p. 309,

(92) Pearson, E. S., “The Probable Error of a Class-index Correlation,”
Biometrika, vol. xiv., 1923, p- 261.

(93) Fismer, R. A., “The Distribution of the Partial Correlation Co-
efficient,” Metron, vol. iii., 1924, p. 329.

(94) PEARsoN, E. S., “ Note on the Approximations to the Probable Error of
a Coefficient of Correlation,” Biometrika, vol. xvi., 1924, p. 196.

(95) CHurcH, A. E. R.,, “On the Moments of the Distribution of Squared
Standard Deviations for Samples of N drawn from an indefinitely
large Population,” Biometrika, vol. xvii, 1925, p. 79.

(96) PearsoN, KARL, ‘Further Contributions to the Theory of Small
Samples,” Biometrika, vol. xvii., 1925, P1760.

(97) SpLawa-NEYMAN, J., ‘Contributions to the Theory of Small Samples
drawn from a Finite Population,” Biometrika, vol. xvii., 1925, p. 472.

(98) FisHER, R. A., ‘“ Applications of ‘Student’s’ Distribution (and follow-
ing Tables by ‘* Student ), Metron, vol. v., No. 3, p. 90, 1925.

(99) Tscaurrow, A. A., “On the Asymptotic Frequency Distributions of
the Arithmetic Means of m Correlated Observations for very great
Values of n,” Jour. Roy. Stat. Soc., vol. 1xxxviii., 1925, p. 91.

(100) Rropnks, E. C., “The Comparison of two Sets of Observations,”
Jour. Roy. Stat. Soc., vol. 1xxxix., 1926, p. 544.

(101) CuurcH, A. E. R., “On the Means and Squared Standard Deviations
of Small Samples from any Population,” Biometrika, vol. xviii., 1926,
p- 321.

On the problem of fluctuations of sampling in correlations between
time-series, see also Yule (43).

395
        <pb n="422" />
        THEORY OF STATISTICS.
Brrors of Sampling in Agricultural Experiment.
A good deal of work has been done on this particular branch of the
subject, and the following references may be useful :—

(102) Berry, R. A., and D. G. O’BriEN, ‘‘ Errors in Feeding Experiments
with Cross-bred Pigs,” Jour. Agr. Sci., vol. xi., 1921, p. 275.

(103) Harris, J. A., “Ona Criterion of Substratum Homogeneity (or Hetero-
geneity) in Field Experiments,” Amer. Naturalist, 1916, p. 430.

(104) HALL, A. D., E. J. RussEnn, T. B. Woop, S. U. PickERING, S. H.
Corrins, ‘The Interpretation of the Results of Agricultural Experi-
ments,” Journal of the Board of Agriculture, Supplement 7, 1911.
(Contains a collection of papers on error in field trials, feeding experi-
ments, horticultural work, milk-testing, ete.)

(105) Lyon, T. L., ‘‘Some Experiments to estimate Errors in Field Plat
Tests,” Proc. Amer. Soc. of Agronomy, vol. iii., 1911, p. 89.

(106) MERCER, W. B., and A. D. Hary, ‘ The Experimental Error of Field
Trials,” Jour. Agr. Science, vol. iv., 1911, p. 107. (With an appendix
by 7 a ” describing the-chessboard method of conducting yield
trials.

(107) MircueLL, H. H., and H. S. GRINDLEY, ‘The Element of Uncertainty
in the Interpretation of Feeding Experiments,” Univ. of Illinois Agr.
Hxp. Station, Bulletin 165, 1913.

(108) RoBiNsoN, G. W., and W. E. LrLoyp, ‘On the Probable Krror of
Sampling in Soil Surveys,” Jour. Agr. Science, vol. viii., 1915, p. 144.

(109) SURFACE, F. M., and RAYMOND PEARL, ‘‘ A Method of Correcting for Soil
Heterogeneity in Variety Tests,” Jour. Agr. Research, vol. v., 1916,

. 1039.

(110) one T. B., and R. A. BERRY, ‘‘ Variation in the Chemical Composi-
tion of Mangels,” Jour. Agr. Science, vol. i., 1905, p. 16.

(111) Woop, T. B., ‘“ The Feeding Value of Mangels,” Jour. Agr. Science,
vol. iii., 1910, p. 225.

(112) Woop, T. B., and F. J. M. STRATTON, ‘‘ The Interpretation of Experi-
mental Results,” Jour. Agr. Science, vol. iii., 1910, p. 417.

(118) BeAVEN, E. S., ‘“ Trials of New Varieties of Cereals,” Jour. Min. Agric.
(England and Wales), vol. xxix., 1922, pp. 387 and 436.

(114) ¢“ STUDENT,” ‘‘ On Testing Varieties of Cereals,” Biometrika, vol. xv.,
1923, p. 271, and supplementary note, vol. xvi., 1924, p. 411.

(115) ExcrLEpow, F. L., and G. U. YuLg, ‘The Principles and Practice
‘of Yield-Trials,” Empire Cotton-Growing Corporation, Millbank House,
Millbank, London, S. W. 1, 1926. Price 2s. (Reprint from the Empire
Ootton-growing Review).

Works on Theory of Statistics, Probability, etc.
(App. IL, p. 361).

(116) BACHELIER, L., Calcul des probabilités, tome i., Gauthier-Villars,
Paris, 1912. r

(117) BACHELIER, L., Le jew, la chance, et le hasard, Flammarion, Paris, 1914.

(118) BowLEY, A. L., Elements of Statistics, P. 8. King, London, 4th ed.,
1920. (This edition has been much extended in Part II., ¢“ Applica-
tions of Mathematics to Statistics”: the two Parts can now be
purchased separately.)

(119) Brown, W., and G. H. TuomsoN, The Essentials of Mental Measure-
ment, 2nd ed., Cambridge University Press, 1921.

3G6
        <pb n="423" />
        SUPPLEMENTS— ADDITIONAL REFERENCES. CT

(120) Brunt, DAVID, The Combination of Observations, Cambridge University
Press, 1917.

(121) CzuBER, E., Die stat. Forschungsmethode, L. W. Seidel, Wien, 1921.

(122) ErperTON, W. PALIN, Addendum to Frequency Curves and Correlation,
London, 1917 (Layton).

(123) FISHER, ARNE, The Mathematical Theory of Probabilities and its
Application to Frequency Curves and Statistical Methods, vol. i., New
York (Macmillan), 1915 : 2nd ed., enlarged, 1922.

(124) ForcBER, Huco, Die statistische Methode als selbstindige Wissenschaft,
Leipzig, 1913 (Veit).

(125) HENRY, A., Calculus and Probability for Actuarial Students, C. and E.
Layton, London, 1922.

(126) Joxes, D. C., A First Course in Statistics, Bell &amp; Sons, London, 1921.

(127) JuriN, A., Principes de statistique théorique et appliquée: tome i.,
Statistique théorique, Paris (Rivitre), Bruxelles (Dewit), 1921,

(128) KEYNES, J. M., 4 Treatise on Probability, Macmillan, London, 1921,

(129) West, C. J., Introduction to Mathematical Statistics, Adams &amp; Co.,
Columbus, 1918.

An inexpensive reprint of Laplace’s Essai philosophique (ref. 17 on p. 361)
has been published by Gauthier-Villars ( Paris, 1921) in the series entitled
‘‘ Les maitres de la pensée scientifique.”

Since the publication of the Seventh Edition, interest in statistical method
has been evidenced by the issue of a rapidly increasin g number of books on the
subject. Of those in the following list, the first five will all be found useful
as supplementing the present volume. Pearl's work is specially intended for
those interested in vital statistics, but will be useful also to others, Kelley’s
book covers a great deal of ground not touched in the present volume and,
though more critical discussion of some of the methods seems to me desirable,
the student will find much that is not otherwise accessible in volume form.
In the very useful handbook edited by H. L. Rietz, each chapter is written
by a specialist ; chapters on Interpolation, Curve Fitting, and Periodogram
Analysis, for example, all deal with matters not discussed in this Introduction.
R. A. Fisher's Statistical Methods is a laboratory handbook rather than a
text-book, and brings together in convenient form for the research worker
the numerous special methods developed, mainly by himself, with especial
reference to small samples. Whittaker and Robinson’s treatise is advanced
and covers a wide field for statisticians and others. The little book by the
late Professor Tschuprow the student may not find easy reading, but it deals
with fundamentals, The remaining books on the list are of a somewhat more
elementary character.

(130) PEARL, R., Introduction to Medical Biometry and Statistics, W. B.
Saunders Co., Philadelphia and London, 1923.

(131) KELLEY, TRUMAN, L., Statistical Method, The Macmillan Co., New
York, 1923.

(132) Ritz, H. L. (edited by), Handbook of Mathematical Statistics, Houghton
Mifflin Co., Boston, 1924.

(133) FisHER, R. A., Statistical Methods for Research Workers, Oliver and
Boyd, Edinburgh and London, 1925.

(134) WHITTAKER, E. T., and G. RoBinsoN, The Calculus of Observations.
Blackie &amp; Son, London, 1924.

(135) Tscaurrow, A. A.. Grundbegriffe und Grundprobleme der Korrelations-
theorie, Teubner, Leipzig, 1925.

390°
        <pb n="424" />
        . THEORY OF STATISTICS.
(136) NICEFORO, A., La Méthode Statistique, Marcel Giard, Paris, 1925.
(187) Skcrist, H., An Introduction to Statistical Methods, revised edition,
The Macmillan Co., New York, 1925.
(188) Crum, L. W., and A. C. PATTON, Economic Statistics, A. W. Shaw Co.,
Chicago and New York, A. W. Shaw &amp; Co., Ltd., London, 1925.
(189) Day, Epmuxp E., Statistical Analysis, The Macmillan Co., New
York, 1925.
The two following books on vital statistics are both revised editions,
Newsholme’s book having been completely rewritten.
(140) NEWSHOLME, Sir ARTHUR, The Elements of Vital Statistics, revised
edition, Allan and Unwin, London, 1923.
(141) WarppLE, G. C., Vital Statistics, 2nd ed., Wiley &amp; Sons, New York ;
Chapman and Hall, London, 1923.
The student of vital statistics who wishes to go on to modern methods
should get Pearl’s book (130) above,

308
        <pb n="425" />
        ANSWERS
TO, AND HINTS ON THE SOLUTION OF, THE EXERCISES GIVEN.
CHAPTER IL
1 26,287 (4B) 887
(4) 2,308 (40) 374
(B) 2,853 (BC) 353
(0) 749 (4B0) 149
(4B0C) 156 (aBC) 179
(dBy) 431 (aBy) 1,249
(480) 272 (aBO) 163
(4B8y) 759 (aBy) 20,504
3. The frequencies not given in the question itself are—
(a) (4B) 107 (AC) 405 (BC) 525.
(0) (4By) 22,980  (aBy) 13,585  (aBC) 96,478  (aBy) 28,868,495.
: (45) (5) mtn). igs)
(48) (8B) (AB) +(4B)” (B)+(B)
: (4B) (4) : (4B) (4)
that is By tet SB) -B) V-(d)
(4B)_(4)
that is (aB) &gt; ay
5. (4B)+(BC)~(B), i.e., the sum of the excesses of (4 B) and (BC) over (B)/2.
8. 160. Take 4 = husband exceeding wife in first measurement, B =
husband exceeding wife in second measuremen t, and find (aB).
CHAPTER II.
1. 80/263 or 304 per thousand.
+ 55/85 or 65 per cent,
32 per cent. and 30 per cent.
4,117.
5. 108.
8. p3 (1-29), p&lt;} (1429), i.e, p must lie between 0 and 1 (1-29) or
between } (1+ 2¢) and 4.
9. As a hint, remember the condition that—
(BC)L(B)+(C)-N.

1.
2.
39¢
        <pb n="426" />
        400

THEORY OF STATISTICS.
CHAPTER III.

1. Deaf-mutes from childhood per million among males 222; among
females 183 ; there is therefore positive association between deaf-mutism and
male sex : if there had been no association between deaf-mutism and sex, there
would have been 3176 male and 3393 female deaf-mutes.

2. (a) positive association, since (4.B),=1457.

(b) negative association, since 294/490=3/5, 380/570 =2/3.
(¢) independence, since 256/768 =1/3, 48/144=1/3.
2 Percentage of Plants above the Average Height,
Parentage Crossed. Self-fertilised.
Ipomea purpurea. : . . 86 per cent. 25 per cent.
Petunia violacea . . EE 17 a
Reseda lutea . . Sa Dahl
Reseda odorata . . «571 AT
Lobelia fulgens . : i500 + 35 a
The association is much less for the species at the end than for those at the
beginning of the list.

4, Percentage of dark-eyed amongst the sons of dark-eyed fathers 39 per
cent.

Percentage of dark-eyed amongst the sons of not dark-eyed fathers 10 per
cent.

If there had been no heredity, the frequencies to the nearest unit would
have been (4B), 18, (48), 111, (aB), 121, (aB), 750.

5. Percentage of light-eyed amongst the wives of light-eyed husbands 59
per cent.

Percentage of light-eyed amongst the wives of not light-eyed husbands 53
per cent,

If there had been no association : (4B),=298, (AB),=225, (aB)y=143, (aB),

=1108.

6. The following are the proportions of the insane per thousand in
successive age groups :—

In general population: 09, 2-3, 41, 57, 69, 7°5, 727,16:8.
Amongst the blind : 20-1, 160,.16°3, 20:7, 18:3, 17-8, 114, 5°3.
Note the diminishing association, which is especially clear in the age-group
65—, and the negative association in the last age-group The association
coefficient gives the values below, which decrease continuously :—
Association coefficient: +092, +075, +0°61, +057, +046, +0°41,
+020, — 0°13.
CHAPTER IV.
1. (D)/N = 6°9 per cent. (4)/N = 6-8 per cent.
(AD)j(4) =450 (AD)(D) =446
(BD)/(B) = 86 (4p)/(B) = &amp;T
(48D)/(4B) =412 ,, (4BD)/(BD) =54'9
(BD)/(B) =42T (4B)[(B) =292
(4BD)/(AB)=516 ,, (4BD)/(BD)=85'3 ,,
The above give two legitimate comparisons. The general results are the same
as for the boys, ¢.e. a very small association between development-defects and
dulness amongst those exhibiting nerve-signs, as compared with those who do
        <pb n="427" />
        ANSWERS, ETC., TO EXERCISES GIVEN. 401
not exhibit nerve-signs, or with the girls in general. As the association
amongst those who do not exhibit nerve-signs is quite as high ss for the girls
in general, the ‘‘ conclusion ” quoted does not seem valid.

2. (1) 2) (1) (2)

per per per per

thousand. thousand. thousand. thousand.
(B)/N 3:2 75 (4)/N) 0-9 4-0
(4B)/(4) 149 117 | (4B)/(B) 40 63
(BOC) 38°8 62-0 (40)/(C) 6°6 18-8

(4BO)[(4C) 216 214 (4BO)[(BC) 36°8 638
The above give the two simplest comparisons, either of which is sufficient to
show that there is a high association between blindness and mental derange-
ment amongst the deaf-mutes as well as in the general population ; amongst
the old, the association is, in fact, small for the general population, but well-
marked for deaf-mutes. This result stands in direct contrast with that of
Qu. 1, where the association between the two defects 4 and D was much
smaller in the defective universe 8 than in the universe at large. As previously
stated, no great reliance can be placed on the census data as to these infirmities.

3. If the cancer death-rates for farmers over 45 and under 45 respectively
were the same as for the population at large, the rate for all farmers 15—
would be 1°11. This is slightly less than the actual rate 1:20, but the excess
would not justify the statement that ‘‘ farmers were peculiarly liable to cancer.”
It is, in point of fact, due to the further differences of age-distribution that we
have neglected, e.g. amongst those over 45 there are more over 55 amongst
farmers than amongst the general population, and so on.

4. 15 per cent.

6. If 4 and B were independent in both C and 4 universes, we would have
(4 B) equal to . -

471x419 151x139
617 T 383 =374"7.
Actually (4B) only=358. Therefore 4 and B must be disassociated in one or
both partial universes.

9. (1) 68°1 per cent. (2) 42'5 per cent. The fallacy discussed in § 2 is
now avoided, and there seems no reason for declining to consider this as evidence
of the effect of expenditure on election results.

10. The limits to y are—

y&lt;i(Bz-22-1)

&gt; H(z +2a?),
subject to the conditions yz, y&lt;{0, y&lt;22-1. No inference of a positive
association from two negatives is possible unless z lies between the limits
"382... 618 ie
11. The limits to ¥ are :—

(1) y&lt;3(6x- 622-1)

&gt; (x + 622),
subject to conditions y&lt;0, {4x -1, pe.

An inference is only possible from positive associations of 4 Band 4C if =p
t ; an inference is only possible from two negative associationsif lies between
211 . . . .and 274. . . . Note that = cannot exceed 3.

(2) y&lt;¥(6x—3x*-1)
&gt; 4(22 + 322),
subject to conditions y&lt;40, (52-1, p=.

26
        <pb n="428" />
        A THEORY OF STATISTICS.
No inference is possible from positive associations of 4B and BC.
An inference is only possible from negative associations if a lie between
‘183 . . . .and ‘215... . Note that cannot exceed 1.
(3) y&lt;3(6x— 222-1)
&gt; 3(3x + 222),
subject to the conditions y&lt;:0, &lt;5z-1, pa.

As in (2), no inference is possible from positive associations of 4C and BC 3
an inference is possible from negative associations if z lie between ‘177 . . . .
and 224 , . , , Note that 2 cannot exceed 3.

CHAPTER V.
1. 4,068. B, 0°36.
CHAPTER VI.
1..1200; 200. 2. 100; 20. 3. 146-25, 4. 2165.
CHAPTER VII.

2. Mean, 15673 1b. Median, 15467 Ib. Mode (approx.) 150'6 1b. (Note
that the mean and the median should be taken to a place of decimals further
than is desired for the mode: the true mode, found by fitting a theoretical
frequency curve, is 151°1 1b.)

3. Mean, 0'6330. Median, 0'6391. Mode (approx.), 0'651. (True mode
is 0653.)

4. £35'6 approximately.

5. (1) 116°0. (2) Means 77°4, 89°0, ratio 114°9. (3) Geometrical means 772,
88-9, ratio 1152. (4) 115-2.

6. (1) 921,507. (2) 916,963.

7. 1st qual. 10s. 63d. 2nd qual. 9s. 21d.

8. n.p. Ifthe terms of the given binomial series are multiplied by 0, 1, 2, 3
. . . , note that the resulting series is also a binomial when a common factor
is removed. [The full proof is given in Chapter XV. § 6.]

CHAPTER VIII.

2. Standard deviation 21'3 1b. Mean deviation 16:4 1b. Lower quartile
142°5, upper quartile 1684 ; whence @=12-95. Ratios: m.d./s.d. =0°77,
@/s.d. =0'61. Skewness, 0:29. :

3. Approximately lower quartile=£26'1, upper quartile=£54'6, ninth
decile = £94.

5. (1) M=732, ¢=17'3. (2) M=732, ¢=17"5. (3) M=73"2, &amp;=180.
(Note that while the mean is unaffected in the second place of decimals, the
standard deviation is the higher the coarser the grouping.)

6. ~/n.pg. The proof is given in Chapter XV. § 6. wr

7. The assumption that observations are evenly distributed over the

+0) 2
        <pb n="429" />
        ANSWERS ETC., TO EXERCISES GIVEN. 403
intervals does not affect the sum of deviations, except for the interval in which
the mean or median lies: for that interval the sum is n, (0°25 +2), hence the
entire correction is

a(n; — ng) +n5(0°25 + d?).
In this expression d is, of course, expressed as a fraction of the class-interval,
and is given its proper sign. Notice that the », and ny of this question are
not the same as the NV; and A, of § 16.
CHAPTER IX,

lL. 0:=1'414, ¢,=2'280, r= +081. X=05¥Y+0'5. }=13X+1l

2. Using the subscripts 1 for earnings, 2 for pauperism, 3 for out-relief ratio,
My=579, 03=809 : rig= — 0°13, 7p3= + 0°60.

CHAPTER XI.

1. 1232 per cent. (against 1240 per cent.): 2'556 in. against 2572 in.

2. The corrected standard-deviation is 0°9954 of the rough value.

3. Estimated true standard-deviation 6-91: standard-deviation of fluctua-
tions of sampling 9°38. (The latter, which can be independently calculated,
is too low, and the former consequently probably too high. €f. Chap. XIV.
§ 10.)

4. 043.

5. 58 per cent.

6. oo?N/(o® + a®)(og® + 037).

7 dado of ="

ERNE RE
8. 0-30.
ea 2.2 72 2 2

9. dl oy -b 09 + cog Y

The others may be written down from symmetry. :

10. (1) No effect at all. (2) If the mean value of the errors in variables is
d, and in the weights e, the value found for the weighted mean is—

The true value +d — 7.0, 0w0——r.
w(w +e)
If is small, d is the important term, and hence errors in the quantities are
usually of more importance than errors in the weights. If » become consider-
able, errors in the weights may be of consequence, but it does not seem probable
that the second term would become the most important in practical cases,

11. @Q=2/3.

12. HE 077.

CHAPTER XII

1. res3= +0°759, rize= + 0097, rey = — 0436.

0123 =2'64, 0213=0°594, 0312=70"1.
X,=9'31+3'37 X,+0'00364 X..
        <pb n="430" />
        404 THEORY OF STATISTICS,

2. To 34= + 0-680, Ti3.04= +0°803, T14.03= + 0-397.
To3.14= — 0433, To413= ——" 0553, Tgqe10= = 0°149.
010034 =917, 02.134 =49°2, Op 13=125, 04-103=105"4.

xX; =53 + 0127 Xo+ 0-587 Xs + 0 ‘0345 X,.

3. The correlation of the pth order is 7/(1 +7). Hence if » be negative, the
correlation of order #—2 cannot be numerically greater than unity and r
cannot exceed (numerically) 1/(n — 1).

4, TE 719

5. Ti9.3= — 1; Ti3.9=7T931= +1.

6. T12.3=713.g=793.7= — 1.

CHAPTER XIII.

1. Theo. M=6, 0=1782 : Actual ¥=6°116, o=1"732.

2. (a) Theo. M=2:5, ¢=1"118 : Actual M=2'48, c=1"14,
(By, M=5, ‘g=1225. Y= 29] o=120.
(eS), eM =313 Rr=11 3230: SNM = 3:47 So = 1-40.

3. Theo. M=50, ¢=5 : Actual M=50°11, ¢=5"23.

4. The standard deviation of the proportion is 000179, and the actual
divergence is 54 times this, and therefore almost certainly significant.

5. The standard deviation of the number drawn is 32, and the actual
difference from expectation 18, There is no significance.

6. p=1-02M, n=M/p : p=0'510, n=120 : p=0-454, n=110"4.

8. Standard deviation of simple sampling 230 per cent. The actual
standard-deviation does not, therefore, seem to indicate any real variation, but
only fluctuations of sampling.

9. Difference from expectation 75 : standard error 10:0. The difference
might therefore occur frequently as a fluctuation of sampling.

10. The test can be applied either by the formule of Case II. or Case III.
Case II. is taken as the simplest.

(2) (4B)/(B)=69"1 per cent.: (4B)/(B)=80'0 per cent. Difference 109
per cent. (A)/N=71"1 per cent. and thence ,=12'9 per cent. The actual
difference is less than this, and would frequently occur as a fluctuation of
simple sampling.

(0) (4B)/(B)=70°1 per cent. : (4B)/(B)=643 per cent. Difference 5:8 per
cent. (4)/N=676 per cent., and thence e;,=38°40 per cent. The actual
difference is 1°7 times this, and might, rather infrequently, occur as a fluctua-
tion of simple sampling.

CHAPTER XIV.
Row. Ope Group of Rows. ape
1 31 5, 6, and 7 2:1
2 2:1 8,9, 10, and 11 16
3 17 12, 13, and 14 1-2
1 7 15 and upwards 11

op is given in units per 1000 births, as s and s,.

2. s,=7'02, and op=2'5 units.

3. ¢2=n.pq as if the chance of success were p in all cases (but the mean is
n/2 not p.n).

4. Mean number of deaths per annum = ¢,*=680,

a2=566,582, r=0°000029.

1.
        <pb n="431" />
        ANSWERS, ETC., TO EXERCISES GIVEN.
CHAPTER XV.
(1) © 1 7 792
12 8 495
66 9 220
220 10 66
495 11:12
. 792 12 1
GU 924 —_
Total, 4096
(2) 0 4594 5 1164
1 1102+6 6.872
r 1212-8 7 47
¢ 8086 8 6
4 3639 - -
Total, 4096-2
3) 0 192
1 288
2 144
3 24
Total, 648

2. The frequency of r successes is greater than that of r—1 so long as
r&lt;np+p: if np is an integer, r =np gives the greatest term and also the mean.

3. This follows at once from a consideration of the Galton-Pearson apparatus.

Binomial Normal curve,
1 17
10 105
45 427
120 116-1
210 211'5
252 2584
210 2115
ete. ete.
5. The data are //=68'855, 0 =256, y,=1558.
6. (1) United Kingdom—direct 1°75, from standard-deviation 1°73.
(2) Cambridge students—direct 168, from standard-deviation 1°73.

7. 70°6 per cent. 8. 27 per cent.

9. (1) In @ 12°4 per cent., b 10 per cent. of the trials, assuming normality,
but the assumption is hardly quite valid. (2) a about 13 times in 100,000
trials ; b practically impossible, being a deviation of over 7 times the standard
error.

10. 853. 11. Mean 74°3, standard-deviation 3°23,

CHAPTER XVI.

3. From equations (10) and (11) replace oy and oy by =; and =, in equation
(9). Regarding this as an equation for , note that 7? is a maximum when
tan 2 0 is infinite, or §=45°,

405
l
        <pb n="432" />
        : THEORY OF STATISTICS.

4. In fig. 50, suppose every horizontal array to be given a slide to the right
until its mean lies on the vertical axis through the mean of the whole distribu-
tion : then suppose the ellipses to be squeezed in the direction of this vertical
axis until they become circles. The original quadrant has now become a
sector with an angle between one and two right angles, and the question is
solved on determining its magnitude,

CHAPTER XVII.

1. Estimated frequency 1554, standard error 0:28 lb. 2. Lower @,
frequency 1472, standard error 0°26 1b. ; upper @, frequency 1116, standard
error 0°34 1b. 3. 0°18 Ib. 4. 0-24 1b., 17 per cent. less than the standard
error of the median. 5. 0 0196 in.or 0°76 per cent. of the standard-deviation :
the standard error of the semi-interquartile range is 1:23 per cent. of that
range.

6. 7 n=100. n=1000.

0-0 0-1 0°0316
0:2 0-096 00304
0-4 0-084 00266
06 0-064 00202
08 0°038 00114

406
        <pb n="433" />
        0)
[The references are to pages. The subject-matter of the Exercises given at
the ends of the chapters has been indexed only when such exercises (or
the answers thereto) give the constants for statistical tables in the text,
or theoretical results of general interest ; in all such cases the number of
the question cited is given. In the case of authors’ names, citations in
the text are given first, followed by citations of the authors’ papers or
books in the lists of references. ]
ABruiTy, general, refs., 388. efficients of, 37-39; illusory or
Accident, deaths from (law of small misleading, 48-51 ; total possible
chances), 265-266. number of, for n attributes, 54-56 ;
Achenwall, Gottfried, Abriss der case of complete independence,
Staatswissenschaft, 2. 56-57 ; use of ordinary correlation-
Ages, at death of certain women coefficient as measure of asso-
(table), 78 ; of husband and wife ciation, 216-217; Pearson's co-
(correlation), 159 ; diagram, 173 ; efficient based on normal corre-
constants (qu. 3), 189. lation (refs.), 40, 333; refs., 15,
Aggregate, of classes, 10-11. 39-40, 333.
Agricultural labourers’ earnings. See Association, partial, generally, 42-
Earnings. 59; the problem, 4243; total
Agriculture, experiment, errors in, and partial, def., 44 ; arithmetical
refs., 396. treatment, 44-48; testing, in
Airy, Sir G. B., use of terms * error ignorance of third-order frequen-
of mean square ”’ and ** modulus,” cies, 51-54 ; refs., 57.
144. Refs., Theory of Errors of — examples: inoculation against
Observation, 360. cholera, 31-32, 34-35, 383-384;
Ammon, O., hair and eye-colour data deaths and occupation, 52-53;
cited from, 61. deaf-mutism and imbecility, 32-
Anderson, O., correlation difference 33; eye-colour of father and son,
method, 198 ; refs., 208, 392. 33-34 ; eye-colour of grandparent,
Annual value of dwelling-houses parent, and offspring, 46-48, 53—
(table), 83; of estates in 1715, 54; colour and prickliness of
table, 100 ; diagram, 101. Datura fruits, 36-37, 377-378;
Arithmetic mean. See Mean, arith- defects in school children, 45-46.
metic. Asymmetrical frequency-distribu-
Array, def., 164; standard-devia- tions, 90-102; relative positions
tion of, 177, 204-205, 236-237, in of mean, median, and mode in,
normal correlation, 319-321. 121-122; diagrams, 113-114. See
Association, generally, 25-59 ; def., also Frequency-distributions.
28 ; degrees of, 29-39 ; testing by Asymmetry in frequency-distribu-
comparison of percentages, 30-35 ; tions, measures of, 107, 149-150.
constancy of difference from in- Attributes, theory of, generally,
dependence values for the second- 1-59; def.,, 7; notation, 9-10,
order frequencies, 35-36; co- 14-15 ; positive and negative, 10 ;
4°
J,
        <pb n="434" />
        THEORY OF STATISTICS.

order and aggregate of classes, different values of p and n, 294,

10-11; ultimate classes, 12; 295; experimental illustrations

positive classes, 13-14; consist- of, 258, 259 (qu. 1 and qu. 2), 274,

ence of class-frequencies, 17-24 371 ; graphic method of forming

(see Consistence) ; association of, a representation of series, 295-

25-59, 377-381 (see Association) ; 297 ; mechanical method of form-

sampling of, 254-334 (see Sam- ing a representation of series, 297—

pling of attributes). 299; refs, 313; direct deter-

Averages, generally, 106-132 ; def., mination of mean and standard-

107 ; desirable properties of, 107— deviation, 299-300 ; deduction of

108; forms of, 108; average in normal curve from, 301-302;

sense of arithmetic mean, 109; refs., 314.

refs., 129-130. See Mean, Median, Bispham, J. W., refs., errors of

Mode. sampling in partial correlations,
Axes, principal, in correlation, 321— 395.

322. Blakeman, J., refs., tests for line-
arity of regression, 209, 354 ; prob-
able error of contingency co-

BACHELIER, L., refs., Calcul des efficient, 354.

probabilités, 396 ; Le jeu, la chance. Boole, G., refs., Laws of Thought, 23.

et le hasard, 396. Booth, Charles, on pauperism, 193,
Barlow, P., tables of squares, etc., 195.

67. Refs., 357. Borel, E., refs., Théorie des proba-
Barometer heights, table, 96; dia- bilités, 360.

gram, 97 ; means, medians, and  Bortkewitsch, L. von, law of small

modes, 122. chances, 265-266, 370; time-
Bateman, H., refs., law of small distributions, 389; refs., law of

chances, 273. small chances, 273, 393.

Bateson, W., data cited from, 37, Bowley, A. L., refs., effect of errors

380-381. on an average, 356 ; on sampling,
Beaven, E. S., refs., yield trials, 396. 354 ; Measurement of Groups and
Beetles (Chrysomelide), sizes of Series, 354 ; Elements of Statistics,

genera, 363-364. 360, 396 ; Elementary Manual of
Beeton, Miss M., data cited from, 78. Statistics, 360 ; cost of living, 390 ;
Bennett, T. L., refs., cost of living, index numbers, 391; Prices and,

390. Wages, 1914-20, 391.

Bernoulli, J., vefs., Ars Conjectandi, — and L. R. Connor, statistical
360. grouping and formule, ref., 394.
Berry, R. A., refs., variation in  Bravais, A., refs., correlation, 188,

mangels, 396; errors in feeding | 332.

experiments, 396. British Association, data cited from,
Bertillon, J., ref., Cours élémentaire stature, 88; weight, 95, see

de statistique, 6, 359. Stature ; Weight; Reports on
Bertrand, J. L. F., refs., Calcul des index-numbers ; refs., 130-131.

probabilités, 360. Address by A. L. Bowley on sam-
Betz, W., ref., Ueber Korrelation, pling, 354.

360. Brown, J. W., refs., index-correla-
Bias in sampling, 261-262, 279-281, tions, 266, 252.

336-337, 343, 353. Brown, W., refs., effect of experi-
— in scale-reading, 362-363. mental errors on the correlation-
Bielfeld, Baron, J. F. von, use of coefficient, 226 ; Zhe Hssentials of

word ‘ statistics,” 1. Mental Measurement, 360, 396.
Binomial series, 291-300; genesis ‘Brownlee, J., refs., frequency curves

of, in sampling of attributes, (epidemiology), 394.

291-293 ; . calculated series for Bruns, H., refs., Wahrscheinlich-

408
        <pb n="435" />
        INDEX. 4.09
keitsrechnung und Kollektivmass- | Coefficient, of association, 37-39 ; of
lehre, 360. contingency, 64-67 ; of variation,

Brunt, D., refs., The Combination of 149, standard error, 351 ; of cor-
Observations, 397. relation, see Correlation.

Collins, H. S., refs., agricultural ex-

CAVE, BeEaTrICE M., correlation dif- periments, 396.
ference method, 198 ; refs., 208. Colours, naming a pair, example of

Cave-Browne-Cave, F. E., correlation contingency, 379-380.
difference method, 198; refs., 208. Connor, R. I. See Bowley.

Census (England and Wales), tabu-  Consistence, of class-frequencies for
lation of infirmities in, 14-15; attributes, generally, 17-24; def.,
data as to infirmities cited from, 18-19 ; conditions, for one or two
32-33 ; classification of occupa- attributes, 20; for three attri-
tions, as example of a hetero- butes, 21-22; refs., 23.
geneous classification, 72; classi- — of correlation-coefficients, 250-
fication of ages, 80, and refs., 105; 251.
data as to ages of husbands and Contingency tables, def., 60 ; treat-
wives cited from, 159. ment of, by elementary methods,

Chance, in sense of complex causa- 61-63 ; isotropy, 68-71, 328-331,
tion, 30 ; of success or failure of testing of divergence from in-
an event, 256. dependence, 378-380.

Chances, law of small, 265-266, — coefficient of, 64-67; applica-
363-367 ; refs., 273, 393. tion to correlation tables, 167, (qu.

Charlier, C. V. L., refs., theory of 3) 189; standard error of (refs.),
frequency curves, resolution of a 355, 395; partial or multiple con-
compound normal curve, 314, 315, tingency (refs.), 390.

393. Contrary classes and frequencies (for

Childbirth, deaths in, application of attributes), 10; case of equality
theory of sampling, 282-284. of contrary frequencies (qu. 6, 7,

Cholera and inoculation, illustra- 8),16; (qu. 8), 24; (qu. 7, 8,9), 59.
tions, 31-32, 34-35, 383-384. Correction of death-rates, ete., for

Chrysomelid ee, distribution of size of age and sex-distribution, 223-225 ;
genus, 363-364. refs., 226, 392.

Church, A. E. R., refs., probable — of standard-deviation for group-
errors, 395. ing of observations, 211-212 ; refs.

Class, in theory of attributes, 8; (including correction of moments
class symbol, 9; class-frequency, generally), 225.

10 ; positive and negative classes, Correction of correlation-coefficient
10 ; ultimate classes, 12 ; order of for errors of observation, 213-214 ;
a class, 10. refs., 225-226, 391-392.

Classification, generally, 8; by di- Correlation, generally, 157-253 ; con-
chotomy, def., 9; manifold, 60- struction of tables, 164 ; represen-
74, 76 ; homogeneous and hetero- tation of frequency-distribution
geneous, 71-72; of a variable for by surface, 165-167 ; treatment of

frequency-distribution or corre- table by coefficient of contingency,
lation table, 76, 80-81, 157, 164. 167 ; correlation-coefficient, 170—
Class-interval, def., 76 ; choice of 174, def. 174, direct deduction,
magnitude and position, 79-80, 231-233; regressions, 175-177,
362-363 ; desirability of equality direct deduction, 365-366, def.
of intervals, 76, 82-83 ; influence 175; standard-deviations of
of magnitude on mean, 113-114, arrays, 177, 204, 205; calculation
115, 116 ; on standard deviation, of coefficient for ungrouped data,
140, 212. 177-181, for a grouped table, 181-
Cloudiness at Breslau, frequency- 188 ; between movements of two
distribution, 103 ; diagram, 104. variables, difference method, 197—

a my
        <pb n="436" />
        THEORY OF STATISTICS.
199, fluctuation method, 199-201. Earnings of agricultural labour-
refs., 208-209, 391-392, 360, 397 ; ers, pauperism and out-relief, 177—
elementary methods for cases of 181 ; constants, (qu. 2) 189, 239 ;
non-linear regression, 201-202; correlation-ratios, 207 ; treatment
rough methods for estimating co- by partial correlation, 239-241 ;
efficient, 202-204 ; correlation- geometrical representation, 245-
ratio, 204-207, 252; effect of 247.
errors of observation on the co- Old-age pauperism and out-
efficient, 213-214; correlation relief, 182-185.
between indices, 215-216; co- Changes in pauperism, out-
efficient for a fourfold table, relief, proportion of old and popu-
direct, 216-217, on assumption of lation, 192-195; partial correla-
normal correlation (Pearson’s co- tion, 241-245.
efficient) ( refs.), 40, 333, 390 ; for Lengths of mother- and
all possible pairs of NV values, 217 daughter-frond in Lemna minor,
218; correlation due to hetero- 185-187.
geneity of material, 218-219; Weather and crops, 196-197.
effect of adding uncorrelated pairs Movements of infantile and
to a given table, 219-220 ; appli- general mortality, 197-199.
cation to theory of weighted mean, Movements of marriage-rate and
221-223 ; correlation in theory of foreign trade, 199-201.
sampling, 271, 286-289, 342, 349—' Correlation, normal, 317-334; de-
350 ; standard error of coefficient, duction of expression for two
352. Refs., 188, 208-209, 225-226, variables, 318-319 ; constancy of
390, 391, 392. For Illustrations, standard-deviation of arrays and
Normal, Partial, Ratio, see below. linearity of regression, 319-320 ;
Correlation, Illustrations and Ex- contour lines, 320-321 ; normality
amples, correlation between :— of linear functions of two nor-
Two diameters of a shell mally distributed variables, 321 ;
(Pecten), 158 ; constants (qu. 3), principal axes, 321-322; testing
189. for normality of correlation table
Ages of husband and wife, 159 ; for stature, 312-328 ; isotropy of
diagram, 173 ; constants (qu. 3), normal correlation table, 328-331 ;
189. outline of theory for any number
Statures of father and son, 160 ; of variables, 331-332 ; coefficient
diagrams, facing 166, 174 ; con- for a normal distribution grouped
stants (qu. 3), 189; correlation- to fourfold form round medians
ratios, 206-207 ; testing normality (Sheppard’s theorem), (qu. 4) 334 ;
of table, 322-328 ; diagram of dia- applications to theory of quali-
gonal distribution, 325; of con- tative observations (refs.), 333.
tour-lines fitted with ellipses of Refs., 332-333, 390, 391.
normal surface, 327. — partial, 229-253; the problem,
Fertility of mother and daugh- partial regressions and correla-
ter, 161, 195-196 ; diagram, 175 ; tions, 229-231 ; direct deduction,
constants (qu. 3), 189. 365-366 ; notation and defini-
Discount rates and percentage tions, 233-234; normal equa-
of reserves on deposits, 162; tions, fundamental theorems on
diagram, facing 166. product-sums, 234-235; signi-
Sex-ratio and numbers of births ficance of generalised regressions
in different districts, 163, 175; and correlations, 236 ; reduction
diagram, 176 ; constants (qu. 3), of standard-deviation, 236-237 ; of
189; correlation - ratios, 207 ; regression, 237-238; of correla-
standard-deviations of arrays and tion, 238 ; arithmetical treatment,
comparison with theory of sam- 238-245; representation by a
pling (qu. 7) 275 and (qu. 1) 289. model, 245-247 ; coefficient of

410
        <pb n="437" />
        INTEX. 411
n-fold correlation, 247-249; ex- ' occupation (partial correction for
pression of correlations and re- age-distribution), 52-53 ; in Eng-
gressions in terms of those of land and Wales, 1881-1890, table,
higher order, 249-250; consist- 77; from diphtheria, table, 98,
ence of coefficients, 250-251; diagram, 97 ; infantile and gene-
fallacies, 251-252 ; limitations in ral, correlation of movements,
interpretation of the partial cor- 197-199 ; standardisation of, for
relation-coefficient, partial associa- age and sex-distribution, 52-53,
tion and partial correlation, 252 ; 223-225, refs., 226, 392; applica-
partial correlation in case of nor- tions of theory of sampling—
mal distribution of frequency, deaths from accident, 265-266.
331-332. Refs., 252-253, 332-333, deaths in childbirth, 282-284,
392. deaths from explosions in mines,

Uorrelation ratio, 204-207 ; standard 287-288; inapplicability of the
error, 352; refs., 209; partial, theory of simple sampling, 260-
252, and refs., 252, 392. 261, 282-284, 285-286, 287-288 ;

Cosin, values of estates in 1715, 100. criteria (refs.), 390.

Cost of living, refs., 390-391. Deciles, 150-152; standard error of,

Cotsworth, M. B., refs., multiplica- 337-341.
tion table, 358. Defects : in school children, associa-

Cournot, A. A, refs., theory of prob- tion of, 12, 4546, refs., 15; cen-
ability, 361. sus tabulation of, 14-15.

Crawford, G. E., refs.,, proof that De Morgan, A., refs., Formal Logic,
arithmetic mean exceeds geo- 23; Theory of Probabilities, 361.
metric, 130. Detlefsen, J. A., refs., fluctuations of

Crelle, A. L., refs., multiplication sampling in Mendelian population,
table, 358. 392.

Crops and weather, correlation, 196- Deviation, mean, 134; generally,
197. 144-147 : def., 144 ; is least round

Crum, L. W., refs., Economic Statis- the median, 144-145; refs., 154;
tics, 398. calculation of, 145-146, (qu. 7)

Cunningham, E., ref., omega-func- 155-156 ; comparison of advan-
tions, 314. tages with standard-deviation,

Czuber, E., refs., Wahrscheinlich- 146 ; of magnitude with standard-
keitsrechnung, 361; Die statis- deviation, 146-147; of normal
tische Forschungsmethode, 397. curve, 304.

Deviation, quartile. See Quartiles.

DarBIsHIRE, A. D,, data cited from, — root-mean-square. See Devia-
128, 265. Refs., illustrations of tion, standard.
correlation, 188, 273. — standard, 134-144; def., 134;

Darwin, Charles, data cited from, relation to root-mean-square de-
269-270. viation from any origin, 134-135 ;

Datura, association between colour is the least possible root-mean-
and prickliness of fruit, 37, 38, square deviation, 135; little
(qu. 10) 275, 380-381. affected by small errors in the

Davenport, C. B., data as to Pecten mean, 135; calculation for un-
cited from, 158. Refs., statistical grouped data, 135-137, for a
tables, 358. grouped distribution, 138-141;

Day, E. E., refs., Statistical Analysis, influence of grouping, 140, 211-
398. 212; range of six times the s.d.

Deaf-mutism, association with im- contains the bulk of the observa-
becility, 33-34, 38; frequency tions, 140-142, 309; of a series
amongst offspring of deaf-mutes, compounded of others, 142-143 ;
table, 104. of N consecutive natural numbers,

Deaths, death-rates, association with 143 ; of rectangle, 143 ; of arrays

Ui
        <pb n="438" />
        412 THEORY OF STATISTICS.
in theory of correlation, 177, 204, tion, 188, 252, 333 ; law of error
205, 319-320; of generalised de- (normal law) and frequency-
viations (arrays), 234, 236-237 ; curves, generally, 273, 314, 393 ;
other names for, 144 ; of a sum theory of sampling, probable
or difference, 210-211 ; effect of errors, etc., 273, 354 ; dissection
errors of observation on, 211; of of normal curve, 315.
an index, 214-215; of binomial Elderton, BE. M., refs., variate differ-
series, 299-300. For standard- ence method, 392.
deviations of sampling, see Error,  Elderton, W. Palin, refs., calculation
standard. of moments, 154 ; table of powers,
De Vries, H., data cited from, 102. 358; tables for testing fit, 354,
Dice, records of throwing, 258-259, 358; Frequency Curves and Cor-
(qu. 1, 2, 3) 274, 371 ; testing for relation, 154, 361, 397.
significance of divergence from  Engledow, F. L., refs., yield trials,
theory, 267, 373-376 ; refs., 273. 396.
Dickson, J. D. Hamilton, normal Error, law of ; errors, curve of.: See
correlation surface, 328. Refs. Normal curve.
normal correlation, 333. — mean, 144.
Difference method in correlation, — mean square, 144.
197-199 ; refs., 226, 252, 392. — of mean square, 144.
Diphtheria, ages at death from, — probable, in sense of semi-inter-
table, 98 ; diagram, 97. quartile range, 147 ; in theory of
Discounts and reserves in American sampling, 310-311. For general
banks, table, 163 ; diagram, facing references, see Error, standard.
166. — standard, def., 267 ; of number
Dispersion, measures of, 107, 133— or proportion of successes in 7
156 ; unsuitability of range as events, 256-257 ; when numbers
a measure, 123; relative, 149; in samples vary, 264-265; when
refs., 154. See Deviation, mean ; chance of success or failure is
Deviation, standard ; Quartiles. small, 265-266; of percentiles
Distribution of Frequency. See (median, quartiles, ete.), 337-341 ;
Frequency-distribution. of arithmetic mean, 344-350 ; of
Doodson, A. T., refs., mode, median, standard-deviation and coefficient
and mean, 390. of variation, 351; of coefficients
Duckweed, correlation between, of correlation and regression, 352 ;
mother- and daughter-frond, 185— of correlation-ratio and test for
187. ; linearity of regression, 352 ; refs.,
Duffell, J. H., ref., tables of gamma- 273, 289, 354-355, 395. See also
function, 358. Sampling, theory of.
Duncker, G., relation between geo-  — theory of. See Sampling, theory
metric and arithmetic mean (qu. of.
9), 156. Estates, annual value of. See Value.
Everitt, P. F., refs., tables for calcu-
EArNINGS of agricultural labourers : lating Pearson’s coefficient for a
calculation of standard-deviation, fourfold table, 358.
135-137 ; mean deviation, 145; Exclusive and inclusive notations for
quartiles, 147; correlation with statistics of attributes, 14-15.
pauperism and out-relief, 177-181, Explosions in coal-mines, deaths
constants, (qu. 2) 189, 239; dia- from, as illustrating theory of
gram, 180 ; by partial correlation, sampling, 288.
239-247 ; diagram of model, 246.  Eye-colour, association between
Edgeworth, F. Y., dice-throwings father and son, 34-35, 38, 70-71 ;
(Weldon), 258 ; probable error of association between grandparent,
median, etc., 344. Refs., Index- parent, and child, 46-48, 53-54 ;
numbers, 130-131, 391 ; correla- contingency with hair-colour, 61,

= iil od
rr
        <pb n="439" />
        INDEX. 4173
63, 66-68; mnon-isotropy of con- | tion tables, 380-383 ; aggregate of
tingency table, for father and son, tables, 383-384; experimental
70-71. illustrations, 384-387; P-table
for use with association tables,
FALRNER, R. P., refs., translation of 388-389; refs., 315, 391, 394;
Meitzen’s Theorie der Statistik, 6. tables for, 358.
Fallacies, in interpreting associations ~~ Fluctuation, measure of dispersion,
—theorem on, 48-49, illustrations, 144.
49-51 ; owing to changes of classi- Flux, A. W., refs., measurement of
fication, actual or virtual, 72; in price-changes, 390.
interpreting correlations—‘“ spuri- ~~ Forcher, H., refs., Die statistische
ous ”’ correlation between indices, Methode als selbstindige Wissen-
215-216; correlation due to schaft, 397.
heterogeneity of material, 218- Fountain, H., ref., index-numbers of
219 ; difference of sign of total prices, 131.
and partial correlations, 251-252. Frequency of a class, 10, 76.
Fay, E. A., data cited from Mar- Frequency-curve, def., 87; ideal
riages of the Deaf in America, 104. forms of, 87-105; normal curve
Fechner, G. T., refs., frequency-dis- (g.v.), 301-313; refs, 105, 314,
tributions, averages, measures of 393-394.
dispersion, ete., 129, 154; Kol- Frequency-distributions, 76 ; forma-
lektivmasslehre, 129, 314, 361. tion of, 79-83 ; graphic represen-
Fecundity of brood-mares, table, 96 ; tation of, 83-87; ideal forms—
diagram, 94 ; mean, median, and symmetrical, 87-90, moderately
mode, (qu. 3) 131; inheritance asymmetrical, 90-98, extremely
(ref.), 208, 226. asymmetrical (J-shaped), 98-102,
Feeding trials, errors in, refs., 396. 363-364, U-shaped, 102-105; bi-
Fertility of mother and daughter, nomial series, 291-300; hyper-
correlation, 161, 195-196; dia- geometrical series (ref.), 289 ; nor-
gram, 175 ; constants, (qu. 3) 189; mal curve, 301-313; theoretical
ref., 208, 226. forms, refs., 289, 314, 393-394;
Field trials, errors in, ref., 396. testing goodness of fit, 373-376.
Filon, L. N. G., ref., probable errors, See Binomial series; Normal
354. curve ; Correlation, normal.
Fisher, A., refs., Mathematical Theory — illustrations : of death-rates in
of Probabilities, 397. England and Wales, 77; of ages
Fisher, Irving, refs., index-numbers, at death of certain women, 78 ; of
390. stigmatic rays on poppies, 78; of
Fisher, R. A., use of term ‘* variance,” annual values of dwelling-houses
144 ; testing goodness of fit, 378, in Great Britain, 83; of head-
387 ; refs., goodness of fit in con- breadths of Cambridge students,
tingency tables, 394 ; of regression 84 ; of statures of males in the
lines, 391; errors of sampling in U.K, 88, 90; of pauperism in
correlation-coefficient, 354, 394 ; different districts of England and
probable errors, 395: Statistical Wales, 93 ; of weights of males in
Methods for Research Workers, 397. the UK. 95; of fecundity of
Fit of a theoretical to an actual brood-mares, 96; of barometer
frequency - distribution, testing, heights at Southampton, 96; of
generally, 370-389; comparison ages at death from diphtheria, 98 ;
frequencies given a priori, 370— of annual values of estates, 100;
378 ; cautions, 373-376; experi- of petals in Ranunculus bulbosus,
mental illustration, 377-378 ; com- 102; of degrees of cloudiness at
parison frequencies based on the Breslau, 103; of percentages of
observations, 378-389; contin- deaf-mutes in offspring of deaf-
gency tables, 378-380; associa- mutes, 104 ; sizes of genera (Chryso-

ES
        <pb n="440" />
        414 THEORY OF STATISTICS.
melide), 364. See also Correla- tions, 226, 252; errors of sam-
tion, illustrations and examples. pling (small samples), 289 ; inocu-
Frequency-polygon, construction of, lation statistics and association,
84. 40; application of law of small
Frequency-surface, forms and ex- chances, 393; multiple happen-
amples of, 164-167 ; diagrams, 166, ings, 394.
facing 166 ; normal, diagram, 166. Grindley, H. S., refs., errors of feed-
See Correlation, normal. ing trials, 396.
Grouping of observations to form
GABAGLIO, A., ref., Teoria generale frequency-distribution, choice of
della statistica, 6. class-interval, 79-80; influence
Galloway, T., ref., Treatise on Prob- on mean, 113-114, 115, 116 ; in-
ability, 361. fluence on standard-deviation,
Galton, Sir Francis, Hereditary 140, 212.
Genius, 3 ; frequency-distribution
of consumptivity, 104; grades HAIR-coLOUR: and eye-colour, ex-
and percentiles, 150, 152 ; regres- ample of contingency, 61-63, 66-
sion, 176 ; Galton’s function (cor- 67 ; non-isotropy, 68, 69 ; theory
relation - coefficient), 204; bi- of sampling applied to certain
nomial machine, 299; normal data, 270-271, 272.
correlation, 328 ; data cited from, Hall, A. D., refs., errors of agri-
34, 46, 70. Refs., geometric mean, cultural experiment, 396.
130; percentiles, 154; correla- Harmonic mean. See Mean, har-
tion, 188, 332; correlation be- monic.
tween indices, 226; binomial Harris, J. A., refs., short method of
machine, 313; Natural Inherit- calculating coefficient of correla-
ance, 154, 313, 332. tion, 209 ; intra-class coefficients,
Gauss, C. F., use of term ‘ mean 209; correlation, miscellaneous,
error,” 144. Refs., normal curve, 392; error in field experiments,
314 ; method of least squares, 361. 396.
Geiger, H., refs., law of small chances, Hart, B., refs., effect of errors on
269. correlation, 391.

Geometric mean. See Mean, geo- Head-breadths of Cambridge  stu-
metric. dents, table, 84 ; diagram, 85.
Geometric (logarithmic) mode, 128.  Helguero, F. de, refs., dissecting

Gibbs, J. Willard, Principles of compound normal curve, 315.
Statistical Mechanics, 4. Henry, A., refs., Calculus and Prob-
Gibson, Winifred, refs., Tables for ability, 397.
computing probable errors, 354, Heron, D., refs., association, 40 ; re-
358. lation between fertility and social
Gini, C., refs., index-numbers, 391. status, 208; defective physique
Grades, 152, 153. and intelligence, application of
Graphic method, of representing correction for age-distribution,
frequency-distributions, 83-87 ; of etc., 226; abac giving probable
interpolation for median of per- errors of correlation - coefficient,
centiles, 118, 151-152; of repre- 354, 358; probable error of a
senting correlation between two partial correlation-coefficient, 354.
variables, 180-181 ; of estimating Histogram, construction of, 84.
correlation - coefficient, 203-204 ; History, refs., of statistics generally,
of forming one binomial polygon 5-6, 390 ; of correlation, 188, 391 ;
from another, 295-297. of normal curve, 393.
Graunt, John, ref., Observations on Hollis, T., cited re Cosin’s Names of
the Bills of Mortality, 6. the Roman Catholics, etc., 100.
Gray, John, data cited from, 270. Hooker, R. H., correlation between
Greenwood, M., refs., index correla- weather and crops, 196 ; between
        <pb n="441" />
        INDEX. 415
movements of two variables, 200, * Jacos, S. M., ref., crops and rainfall,
201. Refs., correlation between 208, 226.
movements of two variables, 208;  Jevons, W. Stanley, use of geometric
weather and crops, 208, 253; mean, 127. Refs., system of

_ theory of partial correlation, 252. numerically definite reasoning

Horticulture, errors in, refs., 396. (theory of attributes), 15; index-

Houses, inhabited and uninhabited, numbers, 130; Pure Logic and
in rural and urban districts, 61-62; other Minor Works, 15; Investiga-
annual value of, table, 83: median, tions in Currency and Finance,
(qu. 4) 131; quartiles, (qu. 3) 155. 130.

Hudson, H. P., refs., frequency- Johannsen, W., Elemente der exakten
curves (epidemiology), 394. Erblichkeitslehre, 361.

Hull, C. H., ref., The Economic John, V., refs., Geschichte der Sta-
Writings of Sir William Petty, tistik, 5.
together with the Observations on Jones, D. C., refs., 4 First Course in
the Bills of Mortality more probably Statistics, 397.
by Captain John Graunt, 6. J-shaped frequency - distributions,

Husbands and wives, correlation be- 98-102, 363-364.
tween ages, table, 159 ; diagram, Julin, A., refs., Principes de Statis-
173 ; constants, (qu. 3) 189. tique, 397.

Hypergeometrical Series, ref., 289.

KaprreEYN, J. C., refs., Skew Fre-

ILLUSORY associations, 48-51. quency-curves in Biology and Sta-

Imbecility, associations with deaf- tistics, 130, 314.
mutism, 32-33, 38. Kelley, T. L., refs., correlation, 392 ;

Inclusive and exclusive notations for Statistical Method, 397.
statistics of attributes, 14-15. Keynes, J. M., refs., 4 Treatise on

Independence, criterion of, for attri- Probability, 397.
butes, 25-28; case of complete, Kick of a horse, deaths from, follow-
for attributes, 56-57; form of ing law of small chances, 265-266 ;
contingency or correlation table 369-370.
in case of, 71 ; goodness of fit test King George, refs., graduation of,
for, 378-387. age statistics, 105.

Index-numbers of prices, def., 126; Knibbs, G. H., refs., price index-
use of geometric mean for, 126- numbers, 390; frequency-curves,
127 ; use of harmonic mean, 129 ; 394.
refs., 130-131, 390-391. Koren, J., refs., History of Statistics,

Indices, correlation between, 215- 390.

216 ; refs., 226, 252, 392.

Infirmities, census tabulation of, LABOURERS, earnings of agricultural.
14-15; association between deaf- See Earnings.
mutism and imbecility, 32-33, 38. Laplace, Pierre Simon, Marquis de,

Inoculation, cholera, examples, 31- probable error of median, 344.
32, 34-35, 381-384. Refs., normal curve, 314; mean

Intermediate observations, in a deviation least about the median,
frequency-distribution, classifica- 154 ; Théorie analytique des proba-
tion of, 80-81, 362-363 ; in corre- bilités, 154, 354, 361 ; Essai philo-
ation table, 164. sophique, 361, 397.

Isotropy, def., 68 ; generally, 67-71; Larmor, Sir J., use of word * statis-
of normal correlation table, 328- tical,” 4.

331 ; refs., 73. Lee, Alice, data cited from, 96, 122,

Isserlis, L., refs., partial correlation- 160, 161. Refs., inheritance of
ratio, 252, 392; conditions for fertility and fecundity, 208, 226 ;
real significance of probable errors, tables of functions, 358, 359.

354 ; probable error of mean, 395. Lemna minor, correlation between
        <pb n="442" />
        416 THEORY OF STATISTICS.
lengths of mother- and daughter- ing of, 220-225; of binomial
frond, 185-187. series, 299; standard error of,

Lexis, W., use of term ‘ precision,” 334-350, (refs.) 355, 395.

144. Refs., Theorie der Massen-- Mean deviation. See Deviation,
erschetnungen, 273 ; Abhandlungen mean.

zur Theorie der Bevilkerungs- und. — error, 144. See Error, standard ;
Moralstatistik, 273, 361. Deviation, standard.

Linearity of regression, test for, — geometric, 108; generally, 123-
205-206, 352; refs. 391. See also 128 ; def., 123 ; calculation, 124 ;
Correlation-ratio. less than arithmetic mean, 123 ;

Lipps, G. F.. refs., measures of difference from arithmetic mean
dependence (association, correla- in terms of dispersion, (qu. 8) 156 ;
tion, contingency, etec.), 40; of series compounded of others,
Fechner’s Kollektivmasslehre, 129, 124 ; of series of ratios or pro-
360. ducts, 124; in estimating inter-

Little, W., data as to agricultural censal populations, 125-126 ; con-
labourers’ earnings cited from, venience for index-numbers, 126—
137. 127 ; use on ground that devia-

Lloyd, W. E., refs., error in soil tions vary with absolute magni-
survey, 396. tude, 127-128 ; weighting of, 225.

Lobelia, application of theory off — harmonic, 108; generally, 128-
sampling to certain data, 269-270, 129 ; def., 128 ; calculation, 128;
272. is less than arithmetic and geo-

Logarithmic increase of population, metric means, 129; difference
125-126 ; logarithmic mode, 128. from arithmetic mean in terms of

Lyon, T. L., refs., errors of agri- dispersion, (qu. 9) 156; use in
cultural experiment, 396. averaging prices if index-numbers,

129 ; in theory of sampling, when

MACALISTER, Sir DoNALD, ref., law numbers in samples vary, 264-265.
of geometric mean, 130, 314. — square error, 144.

Macdonell, W. R., data cited from,’ — weighted, 220-225; def., 220;
84, 90. difference between weighted and

March, L., refs., correlation, 208; unweighted means, 221-223 ; ap-
index-numbers, 390, 391. plication of weighting to correc-

Marriage-rate and trade, correlation tion of death-rates, ete., for age-
of movements, 199-201. and sex-distribution, 223-225 ;

Marshall, A., Money Credit and Com- refs., 226.
merce, ref., 391. Median, 108; generally, 116-120 ;

Maxwell, Clerk, use of word ‘ sta- def., 116; indeterminate in cer-
tistical,” 4. tain cases, 116-117 ; unsuited to

Mean, arithmetic, generally, 108- discontinuous observations and
116; def., 108-109; nature of, small series, 116-117 ; calculation
109 ; calculation of, for a grouped of, 117 ; graphical determination
distribution, 109-113; influence of, 118; comparison with arith-
of grouping, 113-114, 115, 116; metic mean, 119; advantages in
position relatively to mode and special cases, 119-120 ; slight in-
median, 121-122, (refs.) 390; dia- fluence of outlying values on, 120 ;
grams, 113, 114; sum of devia- position relatively to mean and
tions from, is zero, 114 ; of series mode, 121-122, diagrams, 113,
compounded of others, 115; of 114, (refs.) 387; weighting of,
sum or difference, 115-116 ; com- 225 ; standard error of, 337-341.
parison with median, 119; sum-  Meitzen, P. A., refs., Geschichle,
mary comparison with median and Theorie wnd Technik der Statistik, 6.
mode, mean is the best for all. Mendelian breeding experiments as
oeneral purposes, 122-123 ; weight- illustrations, 37, 38, 128, 264-265,

0
        <pb n="443" />
        INVEX, 417
267-268 ; refs, fluctuations of ! Normal curve of errors; deduction
sampling in, 273, 392. from binomial series, 301-302;
Mercer, W. B., refs., errors of agri- value of central ordinate, 304;
cultural experiment, 396. table of ordinates, 303; mean
Methods, statistical, purport of, 3-5 ; deviation and modulus, 304 ; com-
def., 5. parison with binomial series for
Mice, numbers in litters, harmonic moderate value of =n, 304-305;
mean, 128-129; proportions of outline of more general methods of
albinos in litters, fluctuations deduction, 305-307 ; fitting to a
compared with theory of sam- given distribution, 307-308 ; the
pling, 264-265. table of areas, 310, and its use,
Milk testing, errors in, refs., 396. 309-310 ; quartile deviation and
Milton, John, use of word statist,” probable error, 310-311 ; numeri-
i cal examples of use of tables, 311-
Miner, J. R., correlation, ref., 392. 313 ; normality in fluctuations of
Mitchell, H. H., refs., errors of feed- sampling of the mean, 346-347.
ing trials, 396. Refs., general, 314 ; dissection of
Mode, 108; generally, 120-123; compound curve, 315; tables,
def., 120; approximate deter- 358-359. For normal correlation,
mination, from mean and median, see Correlation, normal.
121-122; diagrams showing posi- Norton, J. P., data cited from, 162.
tion relatively to mean and Ref., Statistical Studies in the New
median, 113, 114; logarithmic York Money Market, 208.
or geometric mode, 128; weight-
ing of, 225; refs., 130, 390. O’Brien, D. G., refs., errors in feed-
Modulus as measure of dispersion, ing experiments, 396.
144; origin from normal curve, Order, of a class, 10; of generalised
304. correlations, regressions, devia-
Mohl, Robert von, refs., Geschichte tions, and standard deviations,
und Literatur der Staatswissen- 233-234.
schaften, 5.
Moir, H., refs.,, frequency-curves Parcrave, Sir R. H. IL, Dictionary
(mortality), 394. of Political Economu, 6.
Moment, first, def., 110 ; second and Pareto, V., refs., Cours d’économie
general, def., 135; calculation of politique, 105.
moments, (ref.) 154. Partial association. See Association,
Moore, L. Bramley, data cited from, partial.
96, 161. Ref., inheritance of fer- — correlation. See Correlation,
tility and fecundity, 208, 226. partial.
Morant, G., refs, law of small Patton, A. C., refs., Economic Sta-
chances, 393. tistics, 398.
Mortality. See Death-rates. Pauperism, in England and Wales,
Movements, correlation of, in two table, 93 ; diagrams, 92, 113; cal-
variables, methods, 197-201 ; refs., culation of mean, 111 ; of median,
208, 392. 117, 118; means, medians, and
modes for other years, 122 ; stan-
NEGATIVE classes and attributes, 10. dard-deviation, 138-140; mean
Newsholme, A., refs., birth-rates, deviation, ‘145-146; quartiles,
correction for age-distribution, 148 ; percentiles, 151-152.
etc., 226; Vital Statistics, 359, — correlation with out-relief, 182-
398. 185; with earnings and out-relief,
Niceforo, A., refs., La Méthode Sta- 177-181, (qu. 2) 189, 239-241,
tistique, 398. 245-247 ; with out-relief, propor-
Nixon, J. W., refs., experimental tion of aged, etc., 192-195, 241-
test of normal law, 314. 245.

FEE
a7
        <pb n="444" />
        4135 THEORY OF STATISTICS.

Pearl, Raymond, normal distribu- for unmeasured characters, 152-
tion of number of seeds in Nelum- 153, refs., 333; standard errors
bium, 306. Refs., probable errors, of, 337-341 ; correlation between
355 ; errors in variety tests, 396 ; errors of sampling in, 341-342;
Introduction to Medical Biometry, refs., 154.

397. Perozzo, L., ref., applications of

Pearson, E. S., refs., polychoric co- theory of probability to correla-
efficients, 390; probable errors, tion of ages at marriage, 314.
395. Persons,  W. M., refs, index-

Pearson, Karl, contingency, 63, 65 ; numbers, 390.
mode, 120; standard-deviation, Petals of Ranunculus bulbosus, fre-
144 ; coefficient of variation, 149 ; quency of, 102; unsuitability of
skewness, 149; inheritance of median in case of such a distribu-
fertility, 195; spurious correla- tion, 117.
tion between indices, 215; bi- Peters, J., refs., multiplication table,
nomial apparatus, 299 ; deduction 358.
of normal curve, 306; data cited Petty, Sir W., refs., Economic
from, 70, 78, 90, 96, 122, 160, 161. Writings, 6.

Refs., correlation of characters not Pickering, S. U., refs., errors of agri-
quantitatively measurable, 40, cultural experiment, 396.

333; contingency, etec., 72-73, Poincaré, H., refs., Calcul des prob-
333, 390, 395; frequency-curves, abilités, 361.

105, 130, 154, 273, 289, 314, 315, Poisson, S. D., law of small chances,
354, 393; binomial distribution 368, 369; refs., sex-ratio, 273 ;
and machine, 314; hypergeo- Recherches sur la probabilité des
metrical series, 289 ; dissection of Jugements, 273, 361.

compound normal curve, 315; Poppies, stigmatic rays on, fre-
calculation of moments, 225; quency, 78; unsuitability of
general methods of curve-fitting, median in such a distribution,
209; testing fit of theoretical to 116.

actual distribution, 315, 391, 394; Population, estimation of, between
correlation and correlation-ratio, censuses, 125-126 ; refs., 130, 253.
188, 209, 225, 252, 333, 390, 391, Positive classes and attributes, def.,
392 ; fitting of principal axes and 10; number of positive classes,
planes, 209, 333; correlation be- 13 ; sufficiency of, for tabulation,
tween indices, 226, inheritance of 13; expression of other fre:
fertility, 226; weighted mean, quencies, in terms of, 13-14.
reproductive selection, 226 ; prob- Poynting, J. H., correlation of fluc-
able errors, 355, 393, 395; tables tuations, 201 ; refs., 208.

for statisticians, 358; polychoric Precision, 144, 257, 304.

coefficients of correlation, 390; Prices, index-numbers of, 126; use
variate difference method, 392. of geometric mean, 126 ; of har-

Peas, applications of theory of sam- monic mean, 129 ; refs., 130-131,
pling to experiments in crossing, 390-391.

267-268. Principal axes, in correlation, 321-

Pecten, correlation between two 322; ref., 333.
diameters of shell, 158 ; constants, Probability, theory of, works on,
(qu. 3) 189. refs., 361, 396-397.

Percentage, standard error of, 256
257 ; when numbers in samples QUARTILE deviation. See Quartiles.
vary, 264-265. See also Sam-  Quartiles, quartile deviation and
pling of attributes. semi-interquartile range, 134;

Percentiles, 150-153 ; def., 150 ; de- generally, 147-149; defs., 147;
termination, 151-152 ; advantages determination, 147-148 ; ratio of
and disadvantages, 152-153 : use q.d. to standard-deviation, 148,

Tx
        <pb n="445" />
        I¥ IX, 47
310: advantages of q.d. as a Russell, E. J., refs., errors of agri-
measure of dispersion, 148-149; cultural experiment, 396.
difference between deviations of Rutherford, E., ref., law of small
quartiles from median as measure chances, 273.
of skewness, 149-150; ratio of
q.d. to median as measure of re- SAMPLING, theory of, generally, 254
lative dispersion, 149; q.d. of 355 ; the problem, 254-256 ; refs.,
normal curve, 310; standard 273, 289, 313-315, 354-355, 392,
errors, 337-341, 341-343. 393, 395.
Quetelet, L. A. J., refs., Lettres surla ~~ — of attributes: conditions as-
théorie des probabilités, 272, 361. sumed in simple sampling, 255-
256, 259-262 ; random in sense of
simple sampling, 289 ; standard-
Raxpom sampling, in sense of simple deviation of number or proportion
sampling, 289. of successes in n events, 256-257,
Range, unsuitability of, as a measure 299-300 ; examples from artificial
of dispersion, 133. chance, 258-259; application to
Ranks, 143, 153 ; methods of corre- sex-ratio, 262-264; when num-
lation based on (refs.), 333. bers in samples vary, 264-265 ;
Ranunculus, frequency of petals, when chance of success or failure
102 ; unsuitability of median for is small, 265-266, 366-370 ; stan-
such distributions, 117. dard error, def., 267 ; comparing
Registrar-General : correction or a sample with theory, 267-268 ;
standardisation of death-rates, comparing one sample with an-
224, refs, 226, 392; estimates other independent therefrom, 268-
of population, refs., 130; data 271 ; comparing one sample with
cited from Reports, 32-33, 52-53, another combined with it, 271-
77, 98, 163, 197-199, 199-201, 272 ; limitations to interpretation
222, 263, 283, 284, 285-286. of standard error when = is small,
Regressions, generally, 175-177; inverse interpretation, 276-279 ;
def., 175; total and partial, 233; limits as a measure of untrust-
standard errors of, 352; non- worthiness, 279-281; effect of
linear, 201-202, 205-206, 352; removing conditions of simple
direct deduction, 365-366; refs., sampling, 281-289; sampling
208-209, 391. from limited material, 287; bi-
Relative dispersion, 149. nomial distribution, 291-300 ; nor-
Reserves and discounts in American mal curve, 300-313 ; normal cor-
banks, correlation, 162 ; diagram, relation, 317-334; law of small
facing 166. chances, 366-370 ; refs., 272-273,
Rhind, A., ref., tables for computing 392, 393. See also Binomial
probable errors, 355, 359. series ; Hypergeometrical series ;
Rhodes, E. C., refs., sampling, 393, Normal curve; Correlation,
395; law of error, 393. normal.
Rietz, H. L., refs., Handbook of — of variables, conditions as-
Mathematical Statistics, 397. sumed in simple sampling, 335-
Ritchie-Scott, A., refs., correlation 337; standard errors of percen-
of polychoric table, 390. tiles (median and quartiles), 337-
Robinson, G., refs., Calculus of Obser- 341; dependence of standard
vations, 397. error of median on the form of the
Robinson, G. W., refs., error in soil distribution, 338-340; of differ-
surveys, 396. : ence between two percentiles,
Romanovsky, V., refs., frequency- 341-343; of arithmetic mean,
curves, 393. 344-350 ; of difference between
Ross, Sir R., refs., frequency-curves two means, 345-346 ; normality
(epidemiology), 394. of distribution of mean, 346-347 :
        <pb n="446" />
        2 THEORY OF STATISTICS.
effect of removing conditions of Small chances, law of, 265-266, 366
simple sampling on standard error 370; refs., 273, 393.
of mean, 347-350 ; standard error. Snow, E. C., refs., estimates of popu-
of standard-deviation and co- lation, 130, 253 ; lines and planes
efficient of variation, 351 ; of co- of closest fit, 209.
efficients of correlation and re- Soil surveys, errors in, refs., 396.
gression, 352 ; of correlation-ratio Soper, H. E., frequency arrays, ref.,
and test for linearity of regression, 393:
352 ; refs., 354-356, 395. — refs., probable error of correlation-
Saunders, Miss E. R., data cited coefficient, 355, 395; of biserial
from, 37. expression for correlation - co-
Scale-reading, bias in, 362-363. efficient, 355; tables of exponen-
Scheibner, W., difference between tial binomial limit, 273.
arithmetic and geometric, arith- Southey, Robert, cited re Cosin’s
metic and harmonic means, (qu. 8 Names of the Roman Catholics,
and qu. 9) 156. etc., 100.
Scripture, E. W., use of word Spearman, C., effect of errors of
“statistics,” 3. observation on the standard-de-
Secrist, H., refs., Introduction to viation and coefficient of correla-
Statistical Methods, 398. tion, 213-214. Refs., effect of
Semi-interquartile range. See Quar- errors of observation, 225, 333,
tiles. 391 ; rank method of correlation,
Sex-ratio of births : correlation with 335
total births, 163, 175, 207; dia-  Splawa-Neyman, J., refs., probable
gram, 176; constants, (qu. 3) 189; errors, 395.
application of the theory of sam- Spurious correlation of indices, 215-
pling to, 262-264, (qu. 7) 275, (qu. 216 ; refs., 226, 392.
1, 2) 289, refs., 273; standard Standard-deviation. See Deviation,
error of ratio of male to female standard.
births, (qu. 11) 275. Standardisation of death-rates, 223
Shakespeare, W., use of word 225 ; refs., 226, 392.
statist,” 1. Statist, occurrence of the word in
Sheppard, W. F., correction of the Shakespeare and in Milton, 1.
standard-deviation for grouping, Statistical, introduction and de-
212, 307 ; theorem on correlation velopment. .in the meaning of the
of a normal distribution grouped word, 1-5 ; S. Account of Scotland,
round medians, (qu. 4) 334; 2; Royal S. Society, 3 ; methods,
normal curve tables, 337; stan- purport of, 3-5; def., 5.
dard errors of percentiles, 344. Statistics, introduction and develop-
Refs., calculation and correction ment in meaning of word, 1-5;
of moments, 225; normal curve def., 5; theory of, def., 5.
and correlation, theory of sam-  Statures of males in U.K., tables, 88,
pling, 314, 333, 355; tables of 90; diagrams, 89, 91; calcula-
normal function and its integral, tion of mean, 112; means and
359. medians, 117, (qu. 1) 131; stan-
Significant differences, 266. dard-deviation, 141 ; percentiles,
Sinclair, Sir John, use of words 153 ; standard-deviation, “mean
“ statistics,” ¢ statistical,” 2. deviation, and quartiles, (qu. 1)
Skew or asymmetrical frequency- 155 ; distribution fitted to normal
distributions, 90-102. See also curve, 305-306, 307-308; dia-
Frequency-distributions. gram, 306; standard errors of
Skewness of frequency-distributions, mean and median, of first to
107 ; measures of, 149-150. ninth deciles, 341, 343, 344-345 ;
Slutsky, E., refs., fit of regression of standard-deviation and semi-
lines, 209, 391. interquartile range, (qu. 5) 355.

420
        <pb n="447" />
        TT, ~ 1

Statures, correlation of, for father ' matical expectation of moments,
and son, 160; diagrams, facing 395; distribution of means, 395 ;
166, 174 ; constants, (qu. 3) 189; Korrelations theorie, 397.
testing for normality, 322-328; Type of array, def., 164.
for isotropy, 329-331; diagram
of diagonal distribution, 325, of UrtiMATE classes and frequencies,
fitted contour-lines, 327. def., 12; sufficiency of, for tabu-

Stead, H. (5., correlation-coefficients, lation, 12-13.
ref., 392. Universe, def., 17; specification of,

Stevenson, T. H. C., refs., birth- 37, 18.
rates, correction of, for age-dis- U-shaped frequency - distributions,
tribution, 226. 102-105.

Stigmatic rays on poppies, fre-
quency, 78; unsuitability of VALUE, annual, of dwelling-houses,
median for such distributions, 116. table, 83; median, (qu. 4) 131;

Stirling, James, expression for fac- quartiles, (qu. 3) 155.
torials of large numbers, 304. — of estates in 1715, table, 100 ;

Stratton, F. J. M., refs., errors of diagram, 101.
agricultural experiment, 396. Variables, theory of, generally, 75-

“Student (pseudonym), refs., law 253; def, 7, 75.
of small chances, 273, 393 ; prob- Variance, for square of standard de-
able errors, 355, 395; deviations viation, 144.
from Poisson's Law, 393; prob-  Variates, def., 150.
able errors of Spearman’s corre- Variation, coefficient of, 149 ; stan
lation-coefficients, 395; method dard error of, 351, 352.
of cereal testing, 396. Variety, tests, errors in, refs., 396.

Surface, F. M., refs., errors in variety ~~ Venn, John, refs., Logic of Chance,
tests, 396. sex-ratio, 273, 361.

Symmetrical frequency - distribu- ~~ Verschaeffelt, E., relative dispersion,
tions, 87-90. See also Frequency- 149. Refs., measure of relative
distributions ; Normal curve. dispersion, 154.

Symons, G. J., use of word “sta- Vigor, H. D., data cited from, 163.
tistics *’ in British Rainfall, 3. Refs., sex-ratio, 273.

TABULATION, of statistics of attri- WAGES of agricultural labourers. See
butes, 11-14, 37 ; of a frequency- Earnings.
distribution, 81-83 ; of a correla- Wages, real, refs., 390-391.
tion table, 164. Warner, F., refs., study of defects in

Tatham, John, refs., standardisation school children, notation for sta-
of death-rates, 226. “tistics of attributes, 15.

Thomson, G. H., refs., The Essentials ~~ Water analysis, methods, refs., 393.
of Mental Measurement, 396. Waters, A. C., refs, estimating

Thorndike, E. L., refs., methods of intercensal populations, 130.
measuring correlation, 333; Weather and crops, correlation, 196
Theory of Mental and Social 197 ; refs., 208.

Measurements, 361. Weight of males in U.K., table, 95 ;

Time-correlation problem, 197-201 ; diagram, 94; mean, median, and
refs., 208-209, 392. mode, (qu. 2) 131; standard

Tocher, J. F., refs., contingency, 390. deviation, mean deviation,. and

Todhunter, I., refs., History of the quartiles, (qu.) 155.

Mathematical Theory of Prob- Weighted mean. See Mean,
ability, 6. weighted ; also Mean, geometric ;

Trachtenberg, M. I., refs., property Median; Mode.
of median, 154. Weldon, W. F. R., dice-throwing

Tschuprow, A. A., refs., mathe- experiments, 258-259, 373-376.

INDEX A
        <pb n="448" />
        4_. THEORY OF STATISTICS.
West, C. J., refs., Introduction to istic lines (lines of regression), 177 ;
Mathematical Statistics, 397. problem of pauperism, 192 ; data
Westergaard, H., refs., Theorie der cited from, 78, 93, 122, 140, 163,
Statistik, 6, 273, 361. 185; facing 186, 259, 385. Refs.,
Whipple, G. C., refs., Vital Statistics, history of words * statistics,”
398. “ statistical,” 5; attributes, asso-
Whitaker, Lucy, ref., law of small ciation, consistence, etc., 15, 23,
numbers, 273. 39, 40, 57 ; isotropy, influence of
Whittaker, E. T., refs., Calculus of bias in statistics of qualities, 73 ;
Observations, 397. correlation, 188, 226, 252, 392;
Wicksell, S. D., refs., correlation, correlation between indices, 226 ;
391-392. frequency-curves, 314, 394 ; prob-
Willcox, W. F., citation of Bielfeld, 1. able errors, 355, 395 ; pauperism,
Wolfenden, H. H., mortalities and 130, 208, 253 ; birth-rates, 208,
death-rates, ref., 392. 226 ; sex-ratio, 273 ; fluctuations
Wood, Frances, refs., index-correla- of sampling in Mendelian ratios,
tions, 226, 252; index-numbers, 273 ; time-correlation problem,
390. 392; application of law of small
Wood, T. B., refs., errors of agricul- chances, 393 ; goodness of fit in
tural experiment, 396 ; variation association and contingency-
in mangels, 396. tables, 394 ; yield trials, 396.
Working classes, cost of living, refs.,
390-391. ZIMMERMANN, E. A. W., use of the
words ‘‘ statistics,” ‘ statistical,”
Young, ALLYN A., refs., age statis- in English, 1.
ties, 105. Zimmermann, H., multiplication
Young, Andrew, refs., probable error table, 358.
of coefficient of contingency, 395. Zizek, F., refs., Die statistischen Mit-
Yule, G. U., use of term character- telwerthe and translation, 129.
PRINTED IN GREAT BRITAIN BY NEILL AND CO., LTD., EDINBURGH.

29
        <pb n="449" />
        <pb n="450" />
        <pb n="451" />
        <pb n="452" />
        <pb n="453" />
        <pb n="454" />
        ES 5 CT AR
        <pb n="455" />
        XIV.—REMOVING LIMITATIONS OF SIMPLE SAMPLING. 281

7. Such an examination may be of service, however, as
indicating one possible source of bias, viz. great heterogeneity in
the original material. If, for example, in the first illustration,
the hair-colours of the children differed largely in the different
schools—much more largely than would be accounted for by
fluctuations of simple sampling—it would be obvious that one
school would tend to give an unrepresentative sample, and
questionable therefore whether the five, ten or fifteen schools
observed might not also have given an unrepresentative sample.
Similarly, if the herrings in different catches varied largely, it
would, again, be difficult to get a representative sample for a
large area. But while the dissimilarity of subsamples would
then be evidence as to the difficulty of obtaining a representative
sample, the similarity of subsamples would, of course, be no
evidence that the sample was representative, for some very
different material which should have been represented might
have been missed or overlooked.

8. The student must therefore be very careful to remember
that even if some observed difference exceed the limits of fluctua-
tion in simple sampling, it does not follow that it exceeds the
limits of fluctuation due to what the practical man would regard —
and quite rightly regard—as the chances of sampling. Further,
he must remember that if the standard error be small, it by no
means follows that the result is necessarily trustworthy: the
smallness of the standard error only indicates that it is not
untrustworthy owing to the magnitude of fluctuations of simple
sampling. It may be quite untrustworthy for other reasons:
owing to bias in taking the sample, for instance, or owing to definite
errors in classifying the 4’s and o’s. On the other hand, of course,
it should also be borne in mind that an observed proportion is not
necessarily incorrect, but merely to a greater or less extent
untrustworthy if the standard error be large. Similarly, if an
observed proportion =, in a sample drawn from one universe be
greater than an observed proportion =, in a sample drawn from
another universe, but m, — , is considerably less than three times
the standard error of the difference, it does not, of course, follow
that the true proportion for the given universes, p, and p,, are
most probably equal. On the contrary, py most likely exceeds p, ;
the standard error only warns us that this conclusion is more or
less uncertain, and that possibly p, may even exceed p,.

9. Let us now consider the effect, on the standard-deviation of
sampling, of divergences from the conditions of simple sampling
which were laid down in § 8 of Chap. XIII.

First suppose the condition (a) to break down, so that there is
some essential difference between the localities from which, or the
      </div>
    </body>
  </text>
</TEI>
