| | |
| www.dg.dial.pipex.com | 1723 readers since 14 Aug 2006 |
Hadow (1924) (page numbers in brackets) Notes on the text
|
The Hadow Report (1924)
Psychological tests of educable capacity and their possible use in the public system of education London: HM Stationery Office [page 1] 1. For over two thousand years, in its general problems and accepted principles, psychology presented no great change or development. It continued, as it had begun, a branch of metaphysics rather than of science. The chief method of the psychologist was still introspection; and his chief subject, himself. All that he could offer to the teacher was an a priori system of generalised maxims, vague, speculative, commonplace, and of little practical use. But during the early part of the nineteenth century, influenced chiefly by the introduction from other natural sciences of an experimental procedure and of a mathematical technique, the study of the mind took a new direction. The psychologist left his armchair for the laboratory: and his inquiries and his methods moved further away from those of the philosopher, and inclined for a while towards those of the physiologist. His science, however, for more than another fifty years, still remained an abstract discipline, a science pure and unapplied, preoccupied with fundamental issues as to the nature and the working of mind in general. Differences between one mind and another it ignored. From time to time, indeed, when studying some sample person in his laboratory, the notice of the experimenter was inevitably diverted from a general analysis of thought in the abstract to the concrete peculiarities in the thoughts of particular thinkers. Yet for long these individual differences were looked upon merely as disturbing irregularities, deviations which had to be allowed for, or averaged away, before any sound conclusion of universal scope could be examined or deduced. 'Personal equations', therefore, were devised to discount these so-called errors of measurement. At length, however, attention came to be focused directly upon these differentiating qualities, in and for themselves. Their origin was investigated; and their variations with varying age, sex, race, heredity and (1) This historical introduction was prepared at the request of the Committee by Dr Cyril Burt. [page 2] environment were themselves discovered to be rich in interest. And thus at last an independent branch of mental science, 'differential' or 'individual psychology', was founded and named. (1) 2. In previous centuries, it is true, some crude and curious attempts had been made upon empirical lines to draw up brief rules for the estimation of individual capacity and character. Most of these efforts, however, relied upon external and physical signs, a tendency which today, though obsolescent, is not extinct. In the eighteenth century, Lavater, bringing together many traditional precepts, and adding several shrewd observations of his own, published a treatise on physiognomy; a man's disposition and abilities were to be inferred, according to ingenious principles, from the features and expression of his face. (2). Thirty years later Gall propounded his system of phrenology, localising some twenty-six complex propensities in various areas or 'organs' of the brain, and deducing their development from the relative prominence of the overlying regions of the skull. (3) 3. Anatomical Stigmata. Of all such attempts at inferring psychical qualities from physical signs the most recent and the most widespread is the doctrine of anatomical stigmata. This doctrine received its most thoroughgoing application in the theories of the Italian school of criminology, whose exponents, (1) The problems, principles and development of individual psychology may be traced (down to the date of publication) in W Stern's Differentielle Psychologie (1911), which contains a classified bibliography of over 1,500 references.
(2) The many teachers who are interested in the possibility of physiognomical diagnosis may be referred, not only to Lavater's quaint Essay on Physiognomy destined to make man known and loved (1772), but also to Bell's Anatomy and Philosophy of Expression (1806; 3rd ed. (posthumous) 1872), Darwin's Expression of the Emotions of Man and Animals (2nd. ed. 1889), Mantegazza's Physiognomy and Expression (1904), and, for recent statistical studies, Child Study, June 1919, 'Facial expression as an index of mentality', and Langfeld, 'Judgements of emotions from facial expression', (J. Abn. Psych, xiii., 172; cf. also Psych. Rev., xxv., 488.)
(3) Gall and Spurzheim. Anatomic et Physiologic du Systems Nerveux en General et du Cerveau en Particulier avec des observations sur la possibility de reconnoitre plusieurs dispositions intellectuelles et morales de l'homme et des animaux par la configuration de leurs tetes. (1810). [page 3] under the lead of Cesare Lombroso, published their observations and conclusions during the closing decades of the nineteenth century. The upholders of this doctrine claimed that it was possible to recognise defective intelligence and degenerate character from certain visible marks or malformations connected principally with the size and shape of the head. Small, misshapen, or asymmetrical skulls, low, narrow and bossed foreheads, broad, depressed or upturned noses, narrow, high or V-shaped palates, lobeless, projecting or crumpled ears, these and many similar anatomical anomalies were thought to indicate a reversion to some low and primitive type. Even at the present day there are still many teachers and medical officers who prefer to base their diagnoses of mental defect or moral degeneracy upon physical peculiarities of this kind. (1) Generally, however, little emphasis is now placed upon cranial stigmata. In destroying all practical reliance upon the inspection and measurement of heads, the work of Professor Karl Pearson and his followers has perhaps played the decisive part. (2) Pearson, after analysing data from 5,000 school children and 1,000 undergraduates, concluded that the correlations between intelligence and the dimensions of the skull, though positive, are far too small for trustworthy predictions. There are, (1) Of early work done upon these lines in England, the most important was that of Dr Francis Warner. Extensive examinations in various English cities were carried out by him, and reported to a Special Committee of the London Charity Organisation (1876) appointed to inquire into the education and care of imbeciles. Whatever may nowadays be thought of Dr Warner's reliance upon physical criteria such as 'nerve signs' and 'stigmata of degeneracy', his surveys did much to draw public attention in this country to the question of inborn differences of intelligence and to the problem of the subnormal child.
(2) 'On the Relationship of Intelligence to Size and Shape of the Head'. Biometrika, V. 1906. pp. 105-146. Galton was here the chief pioneer ('Head Growth in Students', Nature XXXVIII, 1886, pp. 14 and 40). Both he and Binet, (L'Annee Psychologique VII, 1900, pp. 314-42; cf. also XVI, 1910, pp. 1-30) obtained slight but positive results. Later researches have emphasised the smallness of the correlation more than its positive character. Dr Goring's research upon convicts in HM prisons also deserves mention as giving what some have considered the death blow to the more extreme claims of Lombroso and the anthropometric school of criminology. (The English Convict: A Statistical Study. HM Stationery Office, 1913.) [page 4] indeed, very few persons, whether normal or abnormal, who do not possess at least one or two so-called stigmata of degeneracy. Unless several of such stigmata are found together in one and the same individual, the indications mean little or nothing; and the conjunction of a number of them is rarely seen, even in special schools for educable defectives, or, in fact, anywhere beyond the walls of asylums for low-grade imbeciles, whither the so-called clinical types are now commonly removed. Except for these rare pathological cases, psychologists are nowadays agreed in distrusting all snapshot judgements based upon an inspection of the face and head; and, when diagnosing mental characteristics, rely upon mental rather than upon physical criteria. (1) 4. Of the application of the new experimental methods to the direct study of the differences of individual minds, the most conspicuous result was the invention of mental tests. After many long and doubtful trials the possibility of mental testing has at length received general acceptance; and recently upon an enormous scale, its value has been demonstrated, and its uses popularised, by the wholesale psychological examination of recruits for the American army. 5. Early Experimental Work. It is a widespread but erroneous notion, current both among teachers and the general public, that psychological testing is a foreign invention, a new and alien import brought to England from abroad; school testing, it is (1) In each of these views - in the speculations of the physiognomist, in the fancies of the phrenologist, and in the doctrine of anatomical stigmata - there are, of course, certain elements of truth. Glandular disturbances, for example, affect both character and intelligence on the one hand, and the development of the bony structures - most conspicuously, perhaps, the bony structures of the head - upon the other. Racial types, again, are characterised by slight differences in intellect and emotionality, and tend to exhibit distinctive traits in the conformation of the skull and face. Quite recently, the method of correlation has shown that often the inferences of practised observers from peculiarities of facial expression are trustworthy to a tolerably high degree, though here, indeed, it is not so much the shape of the hard features or of the bony framework that is significant, but rather the tonus and contractions of the facial muscles, the habitual expression of moods, and the passing responses to emotional stimuli. (See references cited in footnote 2 p. 2 above). [page 5] supposed, has come from France, vocational testing from the United States. This, however, is unjust. The conception of the mental test, to whatever extent it may have been developed and applied of recent years in other countries, was originally put forward by an English scientist. It was Sir Francis Galton who in 1883 first announced the possibility of measuring intellectual abilities by simple laboratory tests. (1) Galton's interest, however, was subsequently diverted towards problems of anthropology and eugenics. He suggested, indeed, or elaborated, some of the chief statistical devices now in use - the method of percentiles and the method of correlation; from time to time, too, he designed special apparatus for testing the muscular, auditory, and other senses of different persons; he even commenced various investigations into the correspondence between anthropometric measurements and intellectual characteristics; and finally, in 1890, he appended to an article on Mental Tests and Measurements by Professor Cattell, (2) an ambitious proposal for comparing laboratory data with independent estimates of human qualities. The work of Cattell and Galton may be claimed as the first seed which bore such prolific fruit in the numerous American researches of later years. Cattell's most distinguished pupil, Prof. EL Thorndike, of Columbia University, who has himself carried out and inspired more numerous researches upon educational tests than any other man, writes thus of his own master: 'Cattell refined Galton's methods, and won recognition for the mental measurement of individuals as a standard division of psychology ... His work on mental tests was the first of a series of influential contributions made during the last decade of the nineteenth century; and was for many of us the introduction to the whole topic of individual psychology'. (3) Galton's early tests were carried out mainly in his Anthropometric Laboratory at South Kensington Museum. Cattell instituted a similar series of mental measurements for the students (1) Galton, F, Enquiries into Human Faculty and its Development (1883).
(2) Dr J McK Cattell (Professor of Psychology, first in the University of Pennsylvania and later in Columbia University) Mind. 1890. XV. 9, p. 380.
(3) Col. Univ. Contr. to Phil. and Psych., XXII, iv., (1914). 92. [page 6] of Columbia University. (1) Both of them aimed, by the intensive application of a long programme of laboratory tests to separate individuals, at making 'a systematic inventory of their mental traits'. Although rich in suggestions for later inquiries, Galton with his English collaborators achieved along these lines no final conclusions of his own. And during the closing years of the nineteenth century the notion of mental testing, in spite of its English origin, was taken over almost exclusively by foreign investigators - by such early workers as Oehrn and Ebbinghaus in Germany, and Boas and Gilbert in America, and, later on, Bourdon, Binet, Henri, and their collaborators in France. These earliest foreign researches, carried out between 1890 and 1900, though numerous, proved contradictory and disappointing. Yet, even at the present time, they are by no means uninstructive. They exhibit, in a clear and concrete fashion, the main recurrent fallacies that beset all who attempt to apply psychological tests to practical ends. Of these initial investigations nearly every one is characterised by two limitations: first, the experimenters confined themselves almost entirely to the simplest mental processes: secondly, they possessed no adequate statistical procedure for determining the value of their results. 6. The tests first used for the measurement of mental capacity consisted chiefly of the traditional laboratory experiments upon simple sensory discrimination. For this there seem to have been several reasons. To begin with, the majority of the new experiments in pure psychology, experiments chiefly suggested by the methods of physiology and physics, were at that time concerned with sensory capacities - with the ear, the eye and the skin, the perception of sound, of light, of weight, and so forth. Further, theoretical psychologists both in England and abroad were still much under the influence of that British school of philosophy, whose axiom had been 'Nihil in intellectu quod non prius in sensu'. The working hypothesis underlying all these early studies with sensory tests seems to have been an emphasis upon (1) J McK Cattell and L Farrand Psych. Rev., (1896), 618-648 'Physical and Mental Measurements of the Students of Columbia University'. [page 7] cognitive attention; it was supposed that the physiological limits of the several sense organs were much the same in different individuals, but that the discrimination of the data supplied by those sense organs depended upon the power of attentive analysis, a power which appeared to vary greatly from one man to another, and which many writers at this date were disposed to identify with general intellectual capacity. 7. Lower Senses: Tests of Touch and Muscle Sense. Galton, and many of the earliest investigators abroad, dealt principally with what may be termed the lower senses. In his enquiries into human faculty and its development, reported in a small volume with that title, (1) he had noticed that mental ability appeared to be correlated with delicate sensory discrimination, as tested, for example, in the comparison of graded weights. Fellows of the Royal Society, he found, could distinguish tiny differences in heaviness with an accuracy equalled only by the practised sorter at the post office: incidentally, he observed, what almost all psychological experience has since confirmed, that tests of capacity as distinct from attainment are, in general, but little affected by practice or experience - far less than would otherwise be supposed. Later, both Gilbert in America (2) and Spearman in England (3) confirmed this correlation. (4) Another lowly form of sensory discrimination, much favoured by the early psychophysicists, was the power of distinguishing two touches upon the skin, the touches being produced by the points of the compass, or by a special instrument known as the aesthesiometer. Binet in France (5) and later Schuyten in Belgium, (6) both following an earlier German investigation by Wagner, (7) found, or thought they found, that touch discrimination, tested in this manner, formed a practicable measure of intelligence. This notion, however, more accurate methods have not confirmed. With the lower senses, it would seem, at any rate within broad (1) See Everyman's Library Edition, pp. 23-25 and 248-251.
(2) Stud, Yale Psych. Lab., 1893, II., pp. 40 et seq.
(3) Amer. Journ. Psychol., 1904, XV., pp. 201 et seq.
(4) The actual measurement of these correspondences by the precise method of correlation was only introduced by later investigators. See below, Section 13.
(5) L'Annee Psych., 1899, VI., pp. 248 et seq.
(6) Arch. de Psych., 1903, II., pp. 321-6, Paedologisch Jaarbock, VII., 1909, pp. 73-116 et al.
(7) Samml. u. Abhandl. a. d. geb. d. Pad. Psych., I., 1896. [page 8] limits, young children are almost as sensitive as old, dull children as bright, and savages as civilised men. 8. Higher Senses: Tests of Hearing and Vision. With tests of the higher senses, of hearing and of sight, some positive correspondence with general intelligence has been more certainly established. Many of the earlier physical surveys carried out in schools had demonstrated that defective vision was extremely common among backward children, although, of course, among the short-sighted many able persons are, nevertheless, to be found. (1) And, measuring experimentally the power to discriminate different shades of brightness by the eye, Gilbert found a small but significant agreement between his visual tests and the intelligence of his examinees; Spearman too, found a similar but larger agreement. One of the earliest of all researches upon the interrelations between different mental capacities, the investigation of Oehrn (2) had introduced a supposed test of visual discrimination somewhat more complex, which has long remained a favourite. This was the cancellation test, consisting in the erasure of given letters of the alphabet from a page of printed matter. Both Oehrn himself in Germany, and Bourdon following him in France (3) found the results of this exercise tally closely with those of other intellectual tests; and later writers, using as a criterion the independent judgement of careful observers, confirmed the presence, if not the amount, of this correspondence. Tests of hearing the earlier experimenters do not seem to have employed though here, once again, previous medical inspections had shown that defective hearing impedes educational progress quite as noticeably as defective sight. (4) The auditory test preferred by the (1) See, e.g. the report of the extensive tests carried out by Smedley at Chicago: 46th Annual Report of Board of Education (Chicago), 1899. Cohn had endeavoured to demonstrate, not unsuccessfully, that myopia is essentially a disease of civilisation, rare among country children and common in the town: see his Hygiene of the Eye, 1886, and Die Sehleistung von 50,000 Schulkindern, Breslau, 1899.
(2) 'Experimentelle Studien zur Individual-Psychologie', Dorpater Dissertation, 1869: see also Kraepelin, Psych. Arbeit., 1 pp. 92-151.
(3) Revue Philosophique, XL., pp. 153 et seq.
(4) See, e.g. Smedley, loc. cit. sup., and for a summary of earlier investigations, Chrisman, 'The Hearing of school children', Ped. Sem, 1893. II., p. 397. [page 9] psychologist consists usually in some form of distinguishing the pitch of musical notes. The measurements obtained from such a test agree with independent estimates of intellectual ability more nearly than those obtained from any other form of simple sensory discrimination. Spearman, for example, working at a later date, and using more precise statistical methods, found a correlation of 0.94; that is to say (as he puts it), 'the intellectual function ... is nine parts out of ten responsible for success in such a simple act as discrimination of pitch'. (1) His observation has since been confirmed by other investigators, such as Whipple (2) who obtained an analogous correlation appreciable and positive, though not quite so high. (3) The peculiar relation between auditory discrimination and general intelligence has been tentatively explained by the dependence of the development of the higher intellectual capacities in man upon his power of speech, and of his power of speech upon his power of hearing. These earlier experiments upon sensory capacity culminated at length in the detailed and elaborate research by Spearman, to which reference has already been made, and which will be described in greater detail on a later page. He found a positive correlation between all measurable modes of sensory discrimination; and attributed this to a common factor which he termed 'general discrimination'. Further, he inferred a similar common factor underlying the different manifestations of intelligence, terming this 'general intelligence'. And, finally, he concluded that 'the common and essential element in intelligence wholly coincides with the common and essential element in the sensory functions'. 9. Meanwhile, both in England and abroad, a reaction had been gathering ground against what was held to be the intellectualistic and sensationalistic bias of traditional psychology. (1) 'General Intelligence Objectively Determined and Measured', Amer. Journ. Psychol., XV, 1904, p. 285.
(2) Manual of Mental and Physical Tests, p. 221.
(3) Seashore reported no correlation between pitch discrimination and intelligence; and considered rather that the results of this test indicated musical capacity. Spearman, however, applying to Seashore's own data more careful methods of correlation obtained from them a distinct and positive coefficient. [page 10] Emphasis was being placed more and more upon the active aspects of the mind. In the United States especially, psychology was acquiring a dynamic and pragmatic cast. And it is, therefore natural to find American investigators beginning to substitute tests of a more active character for the earlier tests of simple cognitive capacity. The measurement of sensation gave way to the measurement of movement. In the new quantitative psychology one of the earliest forms of laboratory experiment had been the measuring of reaction time, of the quickness of motor response to a sensory stimulus. Using this form of experiment side by side with tests of sensory discrimination, Gilbert (1) discovered a far larger correspondence between intelligence and speed of reaction than he had observed between intelligence and the discrimination of sensations. Binet (2) also adopted the reaction experiment; but found that the reaction times harmonised poorly with intellectual capacity, whereas a test of pure movement - simple tapping in fact - showed, at any rate among young children, a closer agreement. Tapping has since been much employed by subsequent investigators; and most of them - Smedley, Bolton, Abelson, Kirkpatrick and Gilbert himself - have verified the existence of some such correlation. Once more, however, the correspondence obtained from all those simple motor tests has seldom proved to be conspicuously high. With older and brighter children it usually proved smaller than with the younger, duller, or defective children; and one or two careful investigators - such as Bagley and Whipple - have discovered no correspondence whatever. With strength of movement, as with speed of movement, the same low values have been found. The dynamometer, for example, may be conveniently used to test the power of grip. But here, as before, the connection with intelligence, though positive, is, as a rule, variable and slight. (3) (1) Stud. Yale Psych. Lab., II.. 1893, pp. 40 et seq.
(2) L'Annee Psychologique, IV., 1897, pp. 64-98, 'Epreuves de Vitesse chez les Jeunes Garcons'.
(3) Binet et Vaschide, L'Annee Psych., IV., 1897. pp. 15-63. 'Experience de Force Musculaire chez les Jeunes Garcons', Smedley, 46th Ann. Rep. Board of Educ. Chicago, 1899. Schuyten L'Annee Psych., IX., 1902, pp. 448-9. [page 11] 10. It will be noticed that, throughout this earlier series of researches, the experimenters were confining themselves almost exclusively for their tests to the simplest mental functions, functions that need for their analysis and measurement technical instruments and apparatus, often difficult to use, generally complicated in structure, and seldom very portable in size and weight. Such tests and such contrivances were rarely fitted for application to young children in the ordinary schools. From time to time, it is true, tests of memory, attention, association or illusion, were added to the tests of sensation or movement. But even these were of a relatively elementary order. Looking back upon these methods, therefore, it is not surprising to discover that they contained no promise of a sure or certain indication for a quality so complex as educable capacity. This slow suspicion, now beginning to break upon the believers in laboratory apparatus themselves, was ultimately confirmed by two or three investigations of paramount importance, which approached the general problem from a somewhat novel angle. 11. Experiments on Formal Training. Of these newer researches the first was mainly negative in its result. It comprised a group of experiments inspired by Professor EL Thorndike, an inquiry which has had far-reaching effect upon educational theory. Primarily, his problem was concerned, not with mental testing, but with mental training and its transfer - with the traditional doctrine of so-called formal discipline. Hitherto, most investigators had been content to assume that each of their tests measured some general faculty or function, and that one test only was all that was needful to provide a complete measurement of the faculty or function in question. Crossing out letters was taken to measure perception; repeating numbers, to measure memory; reading syllables exposed for a brief instant, to measure attention. And these assumptions, as a rule, were left wholly unverified. Often there was no attempt to check the supposed correlation with faculties or with intelligence, by comparing the test results with independent estimates of ordinary observers; the experimenter was content merely to relate one test to another and announce that 'memory is proportional to perception' or that 'association varies [page 12] inversely with other intellectual powers'. Conformably with this implicit belief, it was almost universally accepted that the repeated exercise of intelligence, of will, or of attention, in some limited direction or upon some special subject, would inevitably be followed by a general all-round improvement in the faculty involved. (1) Towards the close of the century, however, the experiments of Thorndike, Woodworth and others showed very conclusively that there might be little or no connection between test results for one and the same function, even when both tests and functions were designated by an identical name. Training in one mental activity exercised no appreciable influence upon other mental activities, even where these activities seemed to be most intimately akin. (2) And later investigators, pursuing this new line of thought, showed that there might be extremely low correspondence between the results of different tests all claiming to measure the same general function - memory, attention, suggestibility, or whatever it might be. 12. Negative Conclusions. The conflicting deductions of various investigators, working with similar tests and a similar procedure, thus seemed to indicate that mental testing could lead to no consistent or trustworthy result. This general disillusionment gained further confirmation from a summary of the most ambitious programme of enquiry carried out in the nineteenth century. In 1901 Wissler (3) published the result of ten years' work at Columbia University with an elaborate series of experimental tests, based largely upon Cattell's original scheme. The conclusions are about as negative as could be conceived. 'The laboratory mental tests', say the writers, 'show little inter-correlation ... The markings of students in college (1) Stumpf, for example, had written 'The power of mental concentration upon certain points, in whatever region acquired, will show itself effectual in all others also'. (Tonpsychologie, p. 123); and again 'Development of will power in connection with any activity is accompanied by development of will power as a whole'. (Psych. Rev., VI., p. 163).
(2) In passing it should be remarked that this conclusion forms one of the many replies to those who fear upon a priori grounds that tests claiming primarily to measure inborn capacities must be gravely disturbed by practice or experience in allied mental processes.
(3) Psych. Rev. Mon. Supp., 19.1. [page 13] classes correlate with themselves to a considerable degree, but not with the tests made in the laboratory'. In Germany, so eminent a psychologist as Kraepelin (1) came also to an identical verdict. 'At the end of these inquiries', he writes, 'we cannot hide from ourselves that the results secured have fallen far short of what we had been led to anticipate from collective experiments with the simplest 'mental tests'. The sanguine hopes of Galton and his immediate followers seemed, then, to have come to nothing; and the notion of mental testing fell into temporary disrepute. (2) 13. After these discouraging pronouncements, for several years little work of any moment was done with mental tests. Suddenly, however, interest was reawakened; a new impetus and a fresh turn were imparted to the whole inquiry by two of the most original and suggestive investigators in this field, Spearman in England, and Binet in France. 14. The Coefficient of Correlation. Following the suggestions and the practice of English statisticians like Galton and Karl Pearson, Spearman now proposed to apply the exact method of correlation to psychological data, much in the same way as it had already been applied to the physical data of biometry and anthropometry; and, devising a short simple and intelligible 'footrule' for calculating such coefficients, he did much to popularise the use of correlation among non-mathematical psychologists. (3) (1) Psych. Arbeiten. II., p. 324.
(2) It should be added that for the measurement of simple and specific capacities, such as sensory discrimination, reaction time, and the like, tests involving elaborate apparatus still remain indispensable. These more elementary capacities, however, are but remotely related to the work of the classroom; and most of the testing of this type is still carried out, not in schools upon children, but in psychological laboratories upon adult students. On the use and technique of such tests an excellent handbook is provided by Whipple's Manual of Mental and Physical Tests. Standard methods are there described in full. Detailed results are brought together; and complete references to the literature are appended for each test.
(3) Amer. J. Psychol., XV, 1904, pp. 72 sqq.; Brit. J. Psychol., II, 1906, pp. 89 sqq. A coefficient of correlation is a fraction or percentage expressing the amount of agreement between two series of measurements: for detailed explanations and examples, see Appendix V. [page 14] Previous investigators had trusted mainly to the coarse method of mere inspection. They made rough classifications of their examinees into bright and dull; glanced at their experimental data; and then declared that, according to their general impression, there was or was not a correspondence between one test and another, or between the tests on the one hand and the teachers' estimates of ability on the other, and that the correspondence was high or low. A procedure so subjective, though to this day too frequently relied upon by teachers and educationists, can hardly claim to be scientific. Little wonder that the earlier inferences were so divergent and conflicting. But this loose guessing was now to be abandoned. The use of a sound statistical method of comparison made it possible to ascertain conclusively the existence of a correlation between two varying qualities, apart from all personal bias, and to measure such correlations objectively, so that they themselves could be validly compared. When previous data were re-examined in the light of these more accurate devices, and the new methods applied to the old results, it could often be shown that the original deductions of various experimenters not only contradicted each other but were actually opposed to the true significance of their own figures. In this way, many of the bewildering discrepancies could immediately be resolved. By the same means, too, irrelevant sources of error could be gauged and eliminated: and, after this had been achieved, an appreciable correlation would frequently emerge, where none before had been suspected. 15. The Conception of General Sbility. In Professor Spearman's hands the adoption of this new statistical weapon led at once to a remarkable triumph. Although at first attacked with considerable vigour, his generalisations have opened up an entirely fresh field and have initiated an entirely new campaign. Of his various conclusions the most important is the theory of a general ability underlying all the various mental activities that admit of being tested. Dr Johnson once declared that, had Newton applied himself to poetry, he would have written a great epic. A critic immediately objected that one man might have great learning, another keen judgement, another a fine imagination. 'No, sir', replied Dr Johnson, 'it is only that one man has more mind than another. He may direct it differently; he may by accident desire to excel in this study or in that. Sir, the man who has [page 15] vigour may walk to the east, just as well as to the west'. (1) The same belief in the versatility of genius, in the diffusion of defect when defect is present, underlies Professor Spearman's hypothesis of a central fund of mental energy. His formulation of the hypothesis, however, was no mere ingenious conjecture. It was the outcome of an elaborate statistical analysis, applied to a long series of experimental tests. In England several researches were immediately commenced to test his methods and results. His mathematical procedure and his psychological conclusions have alike been sharply debated; (2) but, on the whole, the final trend of recent research has been more and more to corroborate his leading principles. At Oxford and at Liverpool, for example, early investigations (3) with tests of increasing complexity, intended to tap both lower and higher levels of the mind, appeared to verify the hypothesis of a so-called central factor, of a general ability, pervading not only simple sensory functions, but also radiating in different directions and to different degrees among all intellectual capacities. At the same time it was shown that the capacities reached by the more searching tests were apparently for the most part hereditary or congenital; and that, unlike the ordinary scholastic examination, these devices were measuring not so much acquired capacity - knowledge gathered by memory, or dexterity gained (1) Boswell, Journal of a Tour to the Hebrides, Aug. 15 (Carruthers' ed., p. 16). The distinction between general and special intelligence was recognised by Aristotle [quotation in Greek] Nic. Eth. VI. vii. 2).
(2) The technical criticisms of Spearman's doctrines cannot be discussed in detail here. Dr William Brown and, more recently, Professor Godfrey Thomson (The Essentials of Mental Measurement, Revised Edition, 1921) have criticised the validity of the mathematical formulae used; Professor EL Thorndike (Amer. J. Psychol., XX., p. 364) and his pupil Simpson (Correlations of Mental Abilities) have criticised the doctrine of a general factor. Dr JC Maxwell Garnett has intervened mainly to the support of Spearman (Brit. J, Psychol., 1919, IX., iii. and iv., 1920, X., ii. and iii.). From later contributions upon either side, it now appears that the differences of view are by no means so complete or irreconcilable as they once appeared. Most of the misgivings seem mainly to concern the adequacy of the theoretical proofs offered. When it comes to practice, the most recent and the most cautious of the critics is found adopting the same working hypothesis and employing much the same tests for general educational ability (e.g. Thomson, Brit. J. Psychol., 1921, XII., iii.).
(3) Burt, Brit. J. Psychol., 1909, III.. i., J. Exp. Ped., 1911, I., ii. [page 16] through practice - but rather something inborn. The two conclusions seemed to suggest a serviceable working definition for the vague term 'intelligence'. It was accordingly defined as inborn, general, intellectual efficiency. 16. The results of the researches just described showed that, for purposes of practical testing, complex processes were more important than simple. They agreed, indeed, with the earlier declarations that simple sensory and motor capacities were dependent upon intelligence; but at the same time they proved quite clearly that the degree of this dependence was comparatively small, so small in fact that it might easily be obscured by errors of measurement or crudities of analysis. With complex abilities it was different. It was now plain that from them far better results could be obtained. Higher mental processes - those, for instance, in which both sense perception and motor activity were combined - furnished tests far more effective than the lower; and, as a general rule, it was discovered that the higher and the more complex the activity tested, the closer was the correlation with intelligence. Instead, therefore, of measuring intelligence by the speed of simple tapping, the investigators now required the examinee to tap an irregular row of dots (the task involved in using McDougall's 'dotting machine') - each tap demanding a distinct effort of aiming and the whole experiment calling forth a high degree of sustained voluntary attention. Instead of asking him to draw a second line equal to the first, or to bisect a single long line, he was required to divide it into two parts in the same proportion as the two parts of a smaller line already divided. Instead of asking him merely to say any word associated with the given test word, he was required to name some word standing in a specified logical relation to the test word - an opposite, a synonym, a whole of which it formed a part, or a genus of which it formed a species: for example, to 'Black' he must reply 'White'; to 'Bad', 'Evil'; to 'Leg', 'Body'; or to 'Dog', 'Animal'. Sometimes the logical relations were not single and uniform, but manifold and mixed, as in the well-known test of Analogies. In this test, the examinee was shown three words, the first pair indicating a definite logical relation; and was required to work a sort of [page 17] 'rule of three' in words instead of numbers; for example: 1. Black is to White as Bad is to ...?Sometimes the logical relations had to be combined to form inferences, as in the so-called test of syllogisms and in the many other tests of reasoning; the following are simple illustrations: 1. Tom is taller than Jim;The foregoing instances may be regarded as simple tests of constructive reasoning - the earliest and the most commonly used type. Tests of critical or destructive reasoning were almost equally effective; and have recently formed the basis of an ingenious scale by Dr PB Ballard. (1) Tests of this latter type are usually termed absurdity tests. Binet had already introduced such a test, consisting of five absurdities to be explained by children aged eleven. The child was asked: 1. '"I have three brothers, Jack, Tom, and myself." What is silly and absurd in that?'Binet's three remaining 'absurdities' contained similar inconsistencies, of a rather gruesome kind to set before young children. But five questions do not make a scale. Accordingly, for this as for so many other tests, in order to multiply the amount of possible marks, investigators have adopted one of two possible devices; they have either compiled a number of discrete questions or statements, or else combined the whole into one consecutive test passage. Dr Ballard, in his absurdity test, employed the former method; and collected as many as thirty-four nonsensical statements, which he graded in a series of ascending difficulty. Others preferred the second method, illustrated by (1) 'The Limit of the Growth of Intelligence', Brit. Journ. Psychol,. XII., ii., (1921), pp. 129-131. [page 18] the extract which follows. The child is required to read the passage and to discover as many absurdities as he can: Tests of these various types were found to correlate closely with intelligence. It will be observed that they depend upon no special knowledge; the words that have to be read by the examinee are well within the powers of children at the ages specified. Hence, the answers to the questions depend, not upon ability to read, but upon ability to think, not upon acquired information or skill, but upon native capacity of intellect. (1) The better tests, therefore, seemed, as a result of all the different experiments, to be those requiring for their performance the higher mental processes: and the best tests of all, those involving the highest levels of thinking, the ability to reason. Henceforward, for the testing of general intelligence simple sensory or motor capacities have been all but universally abandoned. Except, perhaps, when dealing with the youngest or dullest children, no one would now attempt to gauge educational capacity by simple measurements of skin discrimination or speed of tapping. (1) Of researches upon reasoning processes those of Mr WH Winch in England, (Brit. J. of Psych., 1914, VII. p. 190, and Journ. of Exp. Ped., VI. 1921, p. 121), and of Professor FG Bonser in America (Columbia Contributions to Education 1910, XXXVII), merit notice. A scale of graded reasoning tests for English school children was published in Journ. Exp. Ped. V. ii and iii (1919) pp. 68 et. seq.; and an abridgement is reprinted in Appendix VIII. [page 19] 17. The Possibility of Group Testing. At the same time, these newer investigations showed, what had formerly been denied, that group tests might prove just as trustworthy for the measurement of intelligence as the individual tests which up to now had principally been favoured. (1) Tests of sensory or motor capacity, indeed, are difficult to apply to a number of individuals simultaneously; but tests of these higher and more complex capacities can readily be carried out as written class exercises: and the results so procured are both self-consistent and highly diagnostic. Of this conclusion one immediate result was that theoretical inquiries into the relations between various intellectual processes could be carried out much more rapidly and upon a most extensive scale. The practical corollary, the value of group testing for everyday purposes, was not developed until a later period in the history of mental tests. 18. Specific Abilities. Subsequent researches by Spearman and his followers led to the distinction of special capacities side by side with general ability; and the most recent investigations seem to show that any one concrete intellectual activity - such as memorising a given poem or working a given set of sums - may be considered to depend upon intellectual factors of three different orders: first, the general factor, common to all intellectual activities, and known usually as general intelligence; secondly, one or more special or 'group' factors, shared only by a limited number of intellectual processes; and, thirdly, specific or individual factors, peculiar to each particular test itself. The difference, no doubt, is principally a difference of degree. It seems possible to regard the 'general factor' as simply the 'group factor' that is of the most widespread occurrence: and the 'specific factors' as simply the 'group factors' that are most narrowly limited in their operation. In the ordinary work of the elementary school the more important 'special' capacities (if one may trust the results of experiments on a somewhat limited scale) were shown to be the following: 1) arithmetical ability; 2) linguistic ability (which perhaps may be separated into two types - (a) the more elementary verbal factor entering into such simple activities as reading and spelling, and (b) the more highly developed literary factor entering into the activities of English composition); 3) manual ability (which no doubt has also several subordinate (1) See J. Exp. Ped., 1911, loc. cit. sup. [page 20] forms); and perhaps also 4) artistic ability and 5) musical ability. (1) Special abilities of a more elementary and strictly psychological kind have been found far harder to determine. As already observed, in supposing that definite faculties could be measured by one or two typical tests, the earlier experimenters proved to be much mistaken. The few recent inquiries that have approached the problem with the proper statistical methods - those of correlation and 'partial' correlation - have so far been unfruitful: they have as yet succeeded in isolating no special unitary functions; much less have they succeeded in devising for such functions any suitable specific tests. (2) 19. Mental Imagery. One early attempt, however, in the analysis of special abilities - qualitative rather than quantitative in its method - has of late fallen into somewhat undeserved neglect. Galton had classified men according to their predominating type of mental imagery, distinguishing the 'visualiser' from the 'audile', and both from the 'motile', and those again who think in visible pictures of concrete things from those who think chiefly by means of heard or uttered words. (1) See Distribution of Educational Abilities, pp. 56 - 63. Reference should here be made to the early work of Mr RC Moore in Dr Burt's laboratory at Liverpool, of Mr Bradford in Dr William Brown's laboratory at King's College, London, and particularly of Miss Nellie Carey in Professor Spearman's laboratory at University College, London; a full account of her researches in London County Council schools will be found in her articles on 'Factors in the Mental Processes of School Children', especially Part II. 'On the Nature of Specific Mental Factors', and Part III. 'Factors concerned in School Subjects'. (Brit. Journ. Psychol., 1916. VIII., i and ii, pp. 70-92, 170-182).
(2) A striking instance is the negative results of experiments attempting to demonstrate the existence of a special motor ability - one which casual experience would suggest was among the most easily proved. After a careful series of tests, carefully compared and analysed, Professor Muscio concludes: 'There can be no general motor test because there is no 'motor type'. Motor capacities appear to vary independently of one another.' (Brit. Journ. Psychol., 1922. XIII. ii, p. 184). It should be added, however, that the most recent investigations carried out in the psychological laboratory at Cambridge, and at present unpublished, indicate that this apparent independence of motor capacities only holds good on the simplest and lowest levels - a conclusion quite in accordance with the earlier researches on simple motor tests alluded to above, in Section 9. [page 21] For this purpose he used a test based on the familiar device of a standardised questionnaire. He asked his subjects to think first of some definite object, for example, 'your breakfast table as you sat down to it this morning', and to consider carefully the picture that rises before the mind's eye. 'Is the image dim or fairly clear? Are the colours of the china, the toast, the mustard, meat, or parsley, or whatever was on the table, quite distinct and natural?' 'Can you see mentally more than three faces of a die, or more than one hemisphere of a globe at the same instant of time?' 'Can you recall with distinctness the features of near relatives and other persons?' 'Can you hear in your mind's ear a note which is too high for you to sing?' 'Can you, with your lips open and your teeth apart, think mentally of such words as "bubble" or "putty"?' 'Can you in imagination hear the clinking of teaspoons, the slam of a door; or smell the odour of tar, of an oil lamp just blown out; or taste sugar, chocolate, lemon juice, or currant jelly?' According to the nature of their replies, Galton graded different persons, for each form of mental imagery, into what he termed eight octiles, ranging from those whose imagery was 'brilliant and distinct', to those whose power of concrete imagination was 'practically zero'. The detection of such 'imaginal' types was at one time thought to be of great significance for education. The visualising child was to be educated upon 'look-and-say' principles; the audile by 'phonic' methods. But the first enthusiasts, by their wild deductions and excessive claims, brought all such efforts into discredit. With other mental functions there have been similar endeavours to classify individuals according to qualitative types - for example, in memory and in attention; but these have stimulated less interest, and have led to little experimental work. (1) There can be small doubt that, if the teacher were able not only to measure his pupils' intelligence, but also to discover the different qualitative peculiarities of their minds, the individualisation of teaching methods would be enormously enhanced. (1) See for a summary of the chief contributions Meumann, Experimentelle Padagogik, Xte Vorles., Stem, Differentielle Psychologie, Kap. VI., XIV. A convenient English summary will be found in Rusk's Experimental Education, Chap. XI. [page 22] The practical value of such analysis has lately been demonstrated in the case of children with special talents or special disabilities. As recent surveys both in Birmingham (1) and London (2) seem to have shown, the backward child is often backward only because of some special disability. The so-called 'word blind' child, for example, proves often to have been handicapped under a 'phonic' method of teaching by a poverty of auditory imagery, or under a 'look-and-say' system by a poverty of visual imagery. Similarly defects in the various forms of memory will produce grave backwardness in such fundamental subjects as spelling and arithmetic. When these underlying defects have been diagnosed, and the method of instruction appropriately changed, the backwardness (as the published records of individual cases show) may vanish in a year. (3) On the whole, however, it must be admitted that the measurement of the special elementary functions of the mind as yet has hardly begun. In their diagnosis the psychologist has to rely upon observation rather than upon testing. General intelligence has been found in comparison so easy to test, and of such widespread significance, that the testing of special functions has been largely passed over. 20. Meanwhile abroad, a long succession of inquiries, stimulated largely by the work of Dr Alfred Binet in France, had been tending, though by a wholly different route, to conclusions closely resembling those reached in England by the use of correlation. As early as 1895 Binet had published a programme for studying the relations existing between different psychological processes. (4) By means of ten tests for ten separate functions (1) BR Lloyd and C Burt, Report of an Investigation upon Backward Children in Birmingham. (City of Birmingham Stationery Department, 1921).
(2) C Burt, Mental and Scholastic Tests, LCC Reports, 1921.
(3) See Bronner, Psychology of Special Abilities and Disabilities, SL Hollingworth, Special Talents and Defects and Lucy Fildes, 'Word-Blindness' Brain, XLIV. iii., pp. 286-307.
(4) L'Annee Psych. (1896), p. 411. [page 23] he hoped to secure, in the space of one or two brief hours, an exhaustive survey of any given personality. The practical success of Bertillon in determining so-called 'physical constants' by a few rapid measurements of individual criminals, seems to have aroused high expectations of achieving a 'psychic portrait' by methods no less simple and summary. 21. The Diagnosis of Mental Deficiency. Binet's Scale was first constructed for examining the intelligence of Parisian school children suspected of mental deficiency and recommended for transfer from the ordinary school to special classes; and it is for this purpose that the Scale has been increasingly used. At the suggestion of the late Professor JA Green, Miss KL Johnston went over to Paris to study the new methods; and on her return contributed an English translation of Binet's 1908 Scale to Professor Green's journal. (1) In the same year, Dr FG Shrubsall published a critical account of the tests, and of their uses in the examination of children for special schools. (2) In the following year, the Annual Report of the Chief Medical Officer for the Board of Education contained a summary of the entire Scale; and the tests were expressly recommended for use in cases of suspected mental defect. (3) 22. Simplified Methods. Binet began, as we have seen, with simpler tests of motor and sensory capacity, such as tapping, skin discrimination, and the measurement of reaction times. But, at a very early stage, he felt the need for testing processes of a more complex character, such as might approximate more nearly (1) Journ. Exp. Ped. I., i. (1911) pp. 25 et seq.
(2) School Hygiene (1911), pp. 613 et seq. 'The Examination of Mentally Defective Children'. Reference should also be made to the pioneer work carried out by the school medical officers in this country, particularly by Dr James Kerr (formerly School Medical Officer and now Medical Research Officer under the London County Council) and his various colleagues. Their investigations may be found incorporated from time to time in past Annual Reports of the Medical Officer (Education) published by the London County Council. The more recent studies by Dr EO Lewis, published in the Journ. Exp. Ped., IV. iv. (1918) pp. 198-202, and in Studies in Mental Inefficiency IV. ii. (1923) pp. 27-34 should also be referred to in the same connection.
(3) Annual Report for 1912. Appendix E. 'Schedule of Medical Examination of Children for Mental Defect'. [page 24] to the practical activities of ordinary life. He proposed, accordingly, to reject what he described as 'the brass instruments of the band of German psychologists'; to exchange the laboratory for the school; and to use for his experiments 'no apparatus except pen, paper, and a little ink'. In 1896 he began by examining 80 children, asking them to describe a simple picture, and classifying the descriptions into four or five generic types. He claimed that his conclusions formed 'the first achievement hitherto obtained from an experimental study of the higher intellectual faculties'. (1) These early experiments gradually led him to formulate a definite view of intelligence, which, though not published in full until a later date, guided the course of his main researches. He began by distinguishing natural intelligence and acquired culture. And then, contrasting the behaviour of the normal intelligent child on the one hand with that of unintelligent defectives and of the demented and insane on the other, he was led to separate mental processes into two kinds or levels - the lower and the higher. The lower processes consist of the simple reception of sensory impressions and (as he phrased it) the 'mechanical unchaining' of habitual associations. The higher processes are characterised by attentive coordination, manifested in three ways - through purposive direction, through active adaptation, and through conscious correction. The former are analytic; the latter synthetic. The former reproductive; the latter creative. The former are automatic and unregulated; the latter are controlled and self-critical, and exhibit progressive learning through trial and experience. These, in Binet's view, are the chief marks of intelligence. In any intelligent process, he adds, 'the greater number of our elementary faculties are involved ... Nearly all the phenomena with which psychology concerns itself are phenomena of intelligence'. Hence, a test of almost any faculty is, in a sense, a test of intelligence; and, to yield a satisfactory measure, not one test, but many different tests in combination, must be used. 'To cover a wide field of observation, it goes without saying that our tests must be manifold and heterogeneous'; and, 'instead of measuring the intensity of simple faculties, the vain ambition of the psycho-physicists, we shall measure acts of adaptation. Understanding the normal progress of development, (1) L'Annee Psych., (1897), p. 296. [page 25] we shall be able to determine how many years an individual is advanced or retarded'. (1) Though sometimes known as the 'French' procedure, (2) in contrast to the 'German' (which still clung, for the most part, to the traditional class of laboratory experiment), the new type of experiment was not exclusively confined to France. Two years after Binet's first contributions, Ebbinghaus, influenced in all probability, by Binet's fresh method of approach, attacked the problem from a standpoint entirely new in Germany. Being requested by the educational authorities of Breslau to assist in an official examination of certain schools of Silesia, he proceeded to invent a new test of intelligence, which he called a 'Kombinations-Methode' (completion method). (3) Like Binet, he began by analysing the real psychological nature of processes which men in ordinary life are accustomed to regard as intelligent: such processes, he thought, consist invariably in 'bringing together a multitude of separate and disordered fragments of experience into a single whole that is unitary, full of meaning, and, in some way or other, purposive'. Like Binet, therefore, he concluded that intelligence is exercised more in synthesis than in analysis, more in the reorganisation of sensory percepts than in their bare discrimination. Intelligence, in a word, is essentially a process of 'combination'. (4) It was this synthetic process, accordingly, which he aimed at eliciting by his new test. A mutilated passage of prose was (1) See The Development of Intelligence in Children, pp. 40 et seq., 253 et seq., and The Intelligence of the Feebleminded, Chap. XIII., 'A Scheme of Thought' p. 130 et seq.
(2) See Sharp and Titchener (Amer. Journ. Psych., X., pp. 348 et seq.), who expressly set out to investigate this new departure, but with no very promising results.
(3) Zeitschr. f. Psych. und. Phys. d. Sinnesorg., XIII. (1897), p. 401. This important paper of Ebbinghaus has been described as 'the first great attempt to grapple with the problem'. But, as Binet himself points out, experimentation on these new lines in Germany, notably the Wurzburg School's researches upon the psychology of thought, were really stimulated by his own early researches upon intelligence. And, though Binet's 'scheme of thought' was not published in explicit form until 1909, its origin and conception 'dates' (as he says) 'much further back'.
(4) It is impossible not to relate this view of mental activity with Sir Charles Sherrington's conception of the 'Integrative Action of the Nervous System' as a whole. See his Yale lectures published under that title (1906) and earlier papers there cited. [page 26] put before the examinee, and the latter was required to fill in the missing words, thus combining the disconnected words into an intelligible whole, by completing the sense. Between the order furnished by his test, and the order of ability in school work, he found a correspondence remarkably close. Ebbinghaus' procedure, though for long neglected, has since proved of much importance in the history of mental testing. His technique is adaptable to tests of very different types; and of late has even been applied for tests of special knowledge and information. The following specimen is from the opening paragraph of a completion test widely used in this country: 'Revenge is a ... of wild justice, which the more man's nature runs to, the ... ought law to weed it out. For, as for the first wrong, it ... but offend the law ... the revenge of that ... putteth the law out of office'. (1)The earlier completion-tests were based upon simple narratives or stories, depending largely upon imagination and visualisation for their understanding. A test, which, like the foregoing example, depends upon the power to follow and complete a logical argument, has been shown to yield, at any rate with older children, a far better measure of intelligence; (2) and it has even been ascertained that, for young children, the principle of the completion-or combination-method can also be adapted to pictorial instead of verbal material, with almost equally satisfactory results: (3) the test of 'reconstructing dissected pictures' (as it was then termed) has more recently been developed in various forms (4) and freely used as a test of the so-called 'performance' (or non-linguistic) type. In America, as has just been observed, (5) the first attempts to use these simplified methods, in accordance with the 'French' (1) From the first sentences of Bacon's Essay on Revenge.
(2) The Consultative Committee is, of course, of opinion that the test is of greater value if it does not offer any scope for memory. They recommend that sentences should be specially framed for this purpose. The Committee also suggest that tests involving reasoning power are more useful than those of mere pictorial representation.
(3) Journ. Exp. Ped., I., ii (1911), p. 102.
(4) For an excellent instance of this test, derived partly from the principle of the old 'form-board' test, see Healy, Psych, Rev., (1914), 'A Pictorial Completion Test'; and for a critical survey of work done with tests of this character see Pintner and Anderson, 'The Picture Completion Test' (Educ. Psych. Mon., Warwick & York).
(5) See footnote 2, p. 25. [page 27] procedure, were not encouraging. But later researches, both in England and in the United States, produced a large series of simple tests for higher mental processes, which played afterwards a considerable part in the examinations by the written or 'group' method. (1) Tests of this character, however, were for a while cast into the shadow by the great interest, aroused in America and elsewhere, by what is now known as the Binet-Simon Scale. 23. Intelligence Measured by an Age-Scale. Binet's most celebrated achievement was the construction, upon a plan entirely novel, of a metric scale (échelle métrique) for the measurement of intelligence. The final conception of this scale he reached only by steps and stages, so that there is in reality, not one Binet scale, but a series of several scales. In its latest revision the fundamental purpose of the scale was to measure the development of intelligence in terms of mental age. (2) With his new unit of measurement, his new and simplified methods, and his new and suggestive definition of intelligence, Binet now abandoned his earlier attempts to extract a quantitative grading in marks from a single consecutive test; and assembled instead a miscellany of test problems of the most varied character. A correct answer to each test problem was to count as an equivalent fraction of a mental year. His starting point, as we have seen, was not the usual experiments of the laboratory psychologist, but rather the empirical method of the teacher and the alienist ['one who treats mental diseases' OED]. Teachers and doctors had long been accustomed to examine their scholars or their patients by means of questions of a simple conversational type, questions meant for the most part to show the state of mind through general information or scholastic knowledge. This form of oral interview Binet adapted and improved by standardising (1) Among earlier American researches with tests of this type, the following deserve mention: Sharp and Titchener, Amer. J. of Psych., p. 348. Aikins, Thorndike and Hubbell, Psych. Rev., IX., 1902, pp. 374 et seq. - both mainly negative in their results. Of those yielding more positive results, Whitley, 'Tests for Individual Differences', Arch. of Psych., 1911, XIX., is one of the more important. A model attempt at standardising association tests of this kind is that of Woodworth and Wells, Psych. Mon., XIII., 1912.
(2) The concept of a mental age, though new to the psychologist, was not (as we shall find) new to the educationist. See below, Section 34, on the standards formulated by the codes of the Education Department at Whitehall in the sixties of last [i.e. nineteenth] century. [page 28] and grading the questions to be put. He asks the child, for example, such simple questions as these: 'What is your name?' 'Are you a little boy or a little girl?' 'How old are you?' If the child answers the first two correctly, but fails to answer the last, the examiner can already infer that the child's mental age is probably about four years, since a normal child of three can usually give his name and sex, and a child of five can state his own age. Of the various questions that make up the scale, many, like the three just cited, and the recognition and naming of four primary colours (a test for age five), of common articles such as a knife, key and penny, and of objects in a picture (tests for age three), and of the four commonest coins (a test for age six) presuppose merely the normal amount of parental instruction that would be given in an ordinary home. Others depend chiefly upon the teaching of the ordinary school, as naming the days of the week, the months of the year, and the date, or the exercises in reading, writing, dictation and simple computation. Others, again, such as the repetition of numbers progressively increasing in length - 32, 714, 8596, and so on - hardly depend at all upon experience or instruction. Finally, many of the tests still recall the old laboratory experiments upon sense discrimination: for example, the comparison of graded weights, and of lines differing in length. Binet's last version of his scale (1911) contained fifty-four such simple tasks or questions, five being allocated to each age of school life from three to the adult stage, with the exception of years XI, XIII and XIV. The child was assigned a mental age corresponding to the hardest group of questions correctly and completely answered. For the correct answer to any additional question suitable to a higher age, one fifth of a year was added. Binet and his collaborator Simon seem originally to have chosen their tests to gain information, not only about general intelligence, but also about special functions: they speak of such 'faculties' as sensorial perception, attention, memory, comparison, abstraction, and so forth. But later, this possibility seems largely to have retreated into the background; and certainly such light as the Scale yields in this direction is but a general dim glimmer with an occasional illuminating flash. Probably the most striking and the most unimpeachable principle involved in the Scale is that of the heterogeneity of the test problems. If, as we have found, every test process depends, not only upon general intelligence, but also upon specific [page 29] capacities - or 'faculties' (as Binet calls them), it follows that intelligence can never be measured accurately, unless the tests are sufficiently numerous and diverse to eliminate, by averaging, the varying influence of these specialised factors. Since Binet's day it has been a cardinal principle of intelligence testing never to employ (as had hitherto been done) any one test singly, but to combine tests of different types into an average measure. Binet did not check the validity of his scale as a whole, nor yet of its component tests, by the method of correlation, but this method was soon afterwards applied; and it was found that, at any rate with younger and subnormal children, the total results correspond very closely with independent estimates of intelligence. Not until quite recently, however, was the diagnostic value of the separate test problems investigated by an adequate statistical method. It was then found that the several problems differed enormously in their worth; some, such as the test of so-called suggestibility, (1) being of no significance whatever. 24. Revisions of the Binet-Simon Scale. Binet himself, as we have noted, published many different versions of his own Scale. (2) (1) The child is shown three pairs of lines increasing in length, the right being always longer than the left; he is then shown three pairs of lines of equal length. The dull suggestible child - the child who acts automatically upon any guiding idea that is suggested to him - still continues (or is expected to continue) mechanically to declare the line on the right is longer, even when no difference is in fact perceptible.
(2) The following are the chief articles published by Binet upon this subject:
[page 30] In the first form (1905) the tests were grouped according to stages rather than ages. It was only in the 1908 version that Binet showed the practicability of measuring intelligence with a single year as a unit. In the 1908 version, however, the number of tests assigned to successive ages differed greatly: and in the 1911 revision he decided to assign five standardised tests to every year. Even this revision, usually accepted as Binet's final version, was printed and reprinted in different books and pamphlets with minor modifications; and, almost to the day of his death, he was still engaged upon its revisal. It is clear that he did not in any way regard his published versions as ultimate. 25. (i) The Vineland Revision. In America the Binet Scale was welcomed with much eagerness. Dr HH Goddard (at that time Superintendent of the Training School for Defectives at Vineland, New Jersey) issued an early translation of the Scale; (1) and introduced a few slight changes, chiefly by adding one or two fresh tests (the well-known code (2) and clock (3) tests - the latter borrowed from an earlier publication of Binet). This translation was for long the standard version in use in the United States; and has formed the basis of nearly every American adaptation since. But the original claims of virtual infallibility, at first made by American enthusiasts, were not borne out by subsequent experience; and the need for a far more thorough revision, particularly in respect of the age assignments of particular tests, was more and more acutely felt. Of the later versions compiled by American psychologists, two are so drastic as virtually to constitute new scales, and deserve a brief description here. 26. (ii) The Yerkes Point Scale. The first is the Point Scale of Professor RM Yerkes. This consisted of twenty exercises picked, with one exception, from the original Binet-Simon series. (1) The Binet-Simon Measuring Scale: Revised Edition, Vineland, 1911.
(2) This test consists in translating from memory the words 'come quickly' into the proper symbols, letter by letter, according to a simple diagram code, said to have been used during the American Civil War.
(3) This test consists in interchanging in imagination the positions of the large and small hands of a clock - supposed to point originally e.g. to twenty-two minutes past six - and stating the time then indicated. [page 31] The addition was a test first used for the measurement of intelligence in England, and then named the Analogies Test. Many of the less satisfactory tests were discarded by Yerkes; and the results were computed not in terms of mental years, but in terms of simple points or marks. Instead of marking the results by the all-or-none, pass-or-fail method of Binet, partial credit was also given for various attempts according to their different merit. From a theoretical standpoint, the work done by Yerkes and his collaborators was richly suggestive. But for the two chief practical purposes of the Scale - the examining of borderline defectives at the age of admission to special schools, and the examining of supernormal children at the age of sitting for Junior County Scholarships - the Point Scale appears to be, at any rate for English children, in no way superior to the original. (1) 27. (iii) The Stanford Revision. The second and more recent version is the Stanford Revision and Extension, drawn up by Professor LM Terman and his collaborators in California. The Stanford Scale contains ninety tests. For most of the years there are six tests instead of five, so that intelligence is measured in terms of months, instead of in terms of decimals of a year. Of this revision the salient merit lies in the addition of many excellent tests, both new and old, principally for the higher ages. Terman, for example, asks the child to tie a bow-knot, to interpret fables, to explain with a pencil on a little plan how he would search for a lost ball in a circular field, to explain resemblances between things as well as their differences, to repeat numbers backwards as well as to repeat them forwards. One of his best known tests is to define the meanings of a graded vocabulary of 100 words, a test applicable at almost any age, and, according to Terman, the most effective in the whole list. Some of the new tests, however, still have a scholastic bias. For example, a test for age 14 consists of three arithmetical problems, such as the following: 'If a man's salary is 20 dollars a week and he spends 14 dollars a week, how long will it take him to save 300 dollars?'(1) For an excellent discussion of the Point Scale, with results obtained from English low-grade children, see Dr EO Lewis, 'The Binet and Point Scale Methods of Testing Intelligence'; Journ. Exp. Ped., iv. (1918) pp. 198-202. [page 32] And again, among the tests for superior adults there is this problem: 'A mother sent her boy to the river, and told him to bring back exactly 7 pints of water. She gave him a 3 pint vessel and a 5 pint vessel. Show me how the boy can measure out exactly 7 pints of water, using nothing but these two vessels and not guessing at the amount. You should begin by filling the 5 pint vessel first.'Unfortunately, too, Terman still retains many of the original tests of Binet and Simon, that have since been proved to be of little value. Hence the application of the Stanford Revision consumes even a longer amount of time than the original version. During the last year or two the Stanford version has been widely employed in this country, but no re-standardisation has yet been published, specially adapted for English children. Careful experiments show that the age assignments given by Terman, though more suitable than those of Binet, are still in many cases inexact for children attending English schools; and, with the sanction of Professor Terman himself, a further re-standardisation of these tests is now being carried out in this country. (1) 28. (iv) The London Revision. Meanwhile, an English re-standardisation of the original Binet-Simon Scale has been already published, based upon extensive experiments carried out in London schools. It was felt that, for the present at any rate, two alternatives were possible; first, to take the original version of Binet and Simon, which has formed the foundation of all subsequent work, and which has been used so extensively by school medical officers in this country, and re-standardise it as it stood; secondly, to carry out a wholesale reconstruction of the scale according to some entirely new scheme. Merely to revise the original selection by adding three or four new tests, and discarding one or two of the old, seemed of little service. A radical revision, however, must be the work of years. Hence, for provisional purposes, in compiling what may be called the London Revision (2) it was decided to adhere as closely as possible (1) For further observations on this Revision, see the Committee's comments below, Section 59.
(2) The Committee points out that this adaptation of the Binet-Simon scale for English children was arranged by Dr Cyril Burt in consultation with Dr Th. Simon. It is set out, with a useful commentary, in Burt's Mental and Scholastic Tests, London County Council Reports, 1921. [page 33] to the actual methods of Binet and Simon themselves. The only considerable alterations that have been made consist in a re-assignment of the individual tests to the more appropriate ages, as required by the results of careful experiments upon London children. The general framework of the original scheme, and all the original tests, have been retained: so that the results can be computed either in a form directly comparable with the original Binet age arrangement or in a form providing measurements, as accurate as possible with such a scale, for English children in English schools. 29. (v) The Treves-Saffiotti Method. A modification of the Binet-Simon Scale, which, among English-speaking investigators, has attracted but little recognition, is that elaborated in Italy by Professor U Saffiotti and Professor Z Treves. The changes proposed are principally two: first, the tests are to be grouped not only by age, but also by school class: secondly, the children tested are to be graded by description of mental quality instead of by arithmetical computation of years or marks attained. For children of a given age in a given class there are allotted three sets of tests - easy, medium, and hard. According to their success in these, the children are grouped as deboli, medii, and forti - dull, average, and able - and marked D, M, or F. If time allows, the children may be tested with other sets of tests besides those immediately suited to their level; and, according to their further success in these, the children in each group are again subdivided into three finer grades. There are thus in all nine qualitative grades - designated most conveniently by the letters dD, mD, fD (representing different grades of dullness), dM, mM, fM, and so on. The procedure has many obvious points of practical convenience; and in several ways is closely similar to that adopted in this country by school medical officers who have no time or no preference for quantitative measurements in terms of mental age. (1) The metric scale of Binet and Simon has been in general use with those teachers and school medical officers in almost every (1) An early account of the method is given in L'Annee Psychologique, 1912, p. 327: 'L'Echelle Metrique de l'Intelligence de Binet-Simon Modifiee selon la Methode Treves-Samotti'. A critical summary of Saffiotti's latest volume, La Misura dell' Intelligenza, will be found in the Eugenics Review, VIII, iv. 1917, pp. 365-373. [page 34] part of the world who have concerned themselves with psychological tests. First intended mainly to pick out the defective its scope has been progressively enlarged, until in America it has been applied, in one form or another, to the measurement of intelligence among school children of every age and level and even to the measurement of intelligence among adults. It is to this day the favourite instrument for the diagnosis of mental deficiency in children. But for older and brighter individuals - such as candidates for junior county scholarships - it has now been almost entirely abandoned in favour of some other method - group tests, performance tests, or tests of reasoning and of higher mental powers. 30. (vi) The De Sanctis Tests. One other scale deserves description. A year after Binet published his first article on the measurement of intelligence, Professor de Sanctis, of the Laboratory of Experimental Psychology at the University of Rome, set forth in the same French journal a series of six tests, which he had worked out for grading mentally defective children. (1) Like Binet, de Sanctis had reached the notion of a step-like scale of problems, each harder than the last: but whereas Binet sought primarily to measure the amount of positive intelligence, de Sanctis was concerned chiefly to measure the degree of defect. De Sanctis, too, uses rather more apparatus than Binet - coloured balls, wooden cubes, pyramids and oblong blocks, a test-card representing squares, triangles, and rectangles, a screen for covering the apparatus, and a stopwatch for measuring the speed of the child's responses. The nature of the problems can be gauged by citing some of the seventeen test questions given as applicable to feeble-minded children of almost any age. The examiner putting before the child the blocks of wood of various shape, and showing him a cube, says to him: 'Pick out all the pieces that are like it'. Or, again: 'Look at this card: and (1) L'Annee Psychol., (1906) pp. 70-84, 'Types et Degres d'lnsuffisance Mentale'. The Italian account was published in Annali di Nevrol. (Naples, 1906), 'Tipi e gradi d'lnsufficienza Mentale'. A description in English is given in Whipple, Manual of Mental & Physical Tests (1st Ed. 1910, pp. 469-473. Omitted from 2nd edition). A leaflet containing a revised version of the tests (including minor modifications suggested by Madame Montessori and others) can be obtained from Professor de Sanctis ('Reattivi De Sanctis' per la Valutazione dell' Insufficienza Mentale degli Anormali, Mod. 1914). [page 35] point to the things that have the same shape as this piece of wood' (a cube) ... And later he asks: 'Are big things heavier or lighter than small things?' ... 'When things are far away, do they look larger or smaller than things that are near?' Several attempts to standardise and evaluate the de Sanctis tests more precisely have been carried out during recent years. (1) In this country the chief investigation on their use is that undertaken at the Baldovan Institution for Feeble-minded Children by Dr WB Drummond. From his results it would appear that the tests 'afford a rapid and practical means of classifying the mentally defective', but are less suited for differentiating the defective from the normal, or for grading the intelligence of normal children even at the earlier ages. He writes that 'the de Sanctis tests may be utilised as substitutes for some of the tests in the Binet Scale, but they cannot entirely take its place'. (2) This judgement seems to express the general conclusion reached by the few psychologists who have tried the method. 31. During the last six years, group tests, for long overshadowed by the Binet-Simon Scale, have come widely into practical employment. In the early days of mental testing the need for group tests was hardly felt. Children were tested, not in large numbers, but in exceptional instances - notably the comparatively rare cases suspected of mental deficiency. The want was first experienced, not for children but for adults, and not in the school, but in the army during the emergencies of the war. Among all the achievements of psychological examiners, the (1) An early account of the tests, with some comments on their value for the diagnosis of mental deficiency will be found in Dr Shrubsall's article on 'The Examination of Mentally Defective Children', (School Hygiene, 1911, II. ii. pp. 609-612; a reprint of a report presented to the Committee of the British Association (Section L, Portsmouth, 1911) on 'Mental and Physical Factors involved in Education'). The chief American papers on the subject are Goddard, 'The Grading of Backward Children', reprinted from the Training School Bulletin (Vineland), 1908, and 'Mental Development and Measurement of the Levels of Intelligence', Journ. Educ. Psychol., (1911), pp. 498-508, and L Martin and 'A Contribution to the Standardisation of the de Sanctis Tests' Training School Bulletin, (Vineland, 1916). XIII. pp. 93-110.
(2) 'Observations on the de Sanctis Tests' Brit. Journ. Psychol., 1920) X. ii. and iii. pp. 259-277. [page 36] testing of nearly 2,000,000 recruits for the American army still remains the most remarkable. (1) In this examination the main object was twofold: to eliminate as rapidly as possible all who had not sufficient intelligence to be safely trusted with a rifle, and to discover all who possessed a sufficiently high ability to be immediately selected for training as commissioned or non-commissioned officers. Incidentally, however, the records and the results proved of great service in many other ways. In order to test large numbers with the utmost speed, the oral and individual methods hitherto employed by most psychologists were supplanted almost entirely by the wholesale use of written tests administered to large groups simultaneously. The particular tests assembled for this purpose were in part those already used by investigators of the so-called higher mental processes, and in part a series of new test problems devised expressly for the purpose in hand. So impressive and so successful were the results of this collective procedure, that, after the war was concluded, group tests of a similar character were rapidly introduced into many American universities and schools for the examination of entrants. During the last few years there have been compiled, and issued from that country, a large number of publications containing collections of group tests of intelligence, the Terman Group Tests, and Otis Group Tests, the National Intelligence Scale, and many others. (2) The tests set at any one examination usually run to five or six in number, and each test may contain from twenty to fifty short questions. It is, indeed, one of the most suggestive discoveries of recent psychological work that, for an efficient examination, it is far better to use a large number of short questions than a small number of long questions. The tests and the questions are generally printed in the form of a booklet. One booklet is distributed to each candidate, who writes or marks his answers (1) A complete and convenient account is to be found in Yoakum and Yerkes Mental Tests in the American Army. See also Appendix VIII.
(2) See tabular list given in the Twenty-first Yearbook of the National Society for the Study of Education: Intelligence Tests and their Use. (Public School Publishing Co., Bloomington, Illinois, 1922, pp. 93 - 113). The table gives the name, author, number and nature of tests, the ages for which they are applicable, and time required, the publisher, the price, and references (where available) to descriptive accounts. [page 37] on the pages according to instructions. To avoid difficulties in evaluating the various answers that might be written by the candidate himself, a number of alternative answers are often presented to him, and he is required simply to underline the correct reply. For example: (1) In America it would probably be difficult to find, among all the larger and more progressive educational systems, any in which group tests of intelligence are not now being extensively applied. In Chicago, with a school population of nearly half a million, 50 per cent of the children - chiefly those who are backward or advanced - are regularly tested by the principals or teachers. In Washington (DC), in New Rochelle (New York), in Cleveland (Ohio), in Denver (Colorado), in Kalamazoo (Michigan), all the schools are using the National Scale of Intelligence Tests or some similar set. (2) The tests are sometimes carried out by the ordinary teachers, but more usually by teachers specially trained, or by special officers and experts. In Washington, for example, 'an expert psychologist newly appointed is to have a free hand in the classification of all pupils'. For the most part it would seem that the tests are applied chiefly at the entering grades or at the time of promotion to junior and senior high schools. (3) The grading and the promotion are determined by the results, but not, of course, by the group-tests alone. The teacher's rating, the child's school record, and often the results of individual testing, are taken also into consideration. The method of group-testing has been employed, upon a small and experimental scale, by school teachers and scholarship examiners in England. The Bradford Education Authority in 1919 adopted, for the purposes of junior scholarship examinations, a number of the written group tests first used in 1911 for an early research at Liverpool. (4) Two years later, at the request (1) For further examples, see Appendix VIII.
(2) See Appendix III.
(3) See Appendix VI.
(4) Annual Report of the Bradford Education Committee, 1920; cf. also Appendix II, p. 152. [page 38] of the Northumberland Education Authority, Professor Godfrey Thomson devised a set of group-tests for much the same object in what was one of the most notable experiments on the subject in this country. (1) It had been observed that nearly one third of the schools in the county of Northumberland presented no candidates for the ordinary scholarship examinations in English and Mathematics. These schools were largely small schools in isolated rural districts, such as the Cheviots and the Dales; and it appeared possible that, from lack of home culture, of town life, and of teaching facilities, many of the best county pupils might be handicapped in essay writing and arithmetic. Three thousand children were accordingly tested; and it was found that many of the most successful pupils resided in the remoter areas of the county. This early experiment was so successful that a group-test of intelligence has since been introduced on every occasion into the Northumberland examinations for such scholarships. Similar difficulties have been encountered by other English education authorities; and have been dealt with experimentally by similar means. Dr Ikin, Education Officer for Blackpool, (2) has tested a group of one hundred scholarship candidates for junior county scholarships with five of the better known group tests (the Terman, Otis, Northumberland, Simplex, and National Scale respectively). At Rugby, Mr Vaughan has applied a set of group tests, devised in London, first to certain selected forms and later to the whole of his school. (3) At Cheltenham Grammar School, Mr Dobson has applied the same tests (with others) both to the entire school and to candidates for scholarship and entrance. (3) These are but a few of the more notable experiments upon these lines. Where the results have been statistically analysed, it is found that the calculated correlations show a close correspondence with the results of independent scholarship examinations or of independent personal judgements. Where the tests and the scholarship examinations disagree, subsequent study of the children shows that the test has often revealed inborn ability which the scholastic examination failed to detect, owing to the child's lack of opportunity, at school or at home, for (1) Brit. J. Psychol., XII. 201. 1922. cf. Appendix II, pp. 153-155.
(2) See The Times Educational Supplement, 6 Oct. 1923, p. 444, 'Group Intelligence Tests'. cf. Appendix II, pp. 149-151.
(3) See Appendix II. [page 39] acquiring the necessary knowledge. None of the investigators, however, has as yet claimed that intelligence tests can do more than supplement written examinations of the ordinary scholastic type. Before a child can be admitted to a secondary school he must possess a certain minimum of educational knowledge; and this is to be gauged, not by a test of mental capacity, but by a test of scholastic acquirements. Group-tests have also been introduced into examinations for adults. At the London Day Training College, psychological tests have been employed for the last two years as a supplementary means of selecting candidates for the four year course of training for teachers. (1) At Bedford College, since 1921, the Psychological Department has tested incoming students of different faculties with tests designed to measure 'arts' ability, 'science' ability, and 'general ability' or intelligence. At University College, the staff of the Psychological Laboratory has similarly employed group-tests to test the intelligence of freshmen who have volunteered to sit for the examination. These experiments are still too recent for any adequate comparison to be made with the subsequent academic careers of the candidates. In this country, however, the most extensive use of such group tests has been the introduction of a psychological test paper into the competitive examination for clerical posts in the Civil Service in 1920 and the following years. Nearly 40,000 candidates have been so tested; and an analysis of the published mark lists shows that the psychological test correlates with the general results more closely than any other single paper. (1) The candidates are young men and women between the ages of 17 and 19. A preliminary experiment was made in 1922, and the results were sufficiently useful to warrant an extension of the method. In 1923 all applicants took a half-hour's test paper, similar to that devised for the Civil Service competition (used also experimentally at the Bristol University Department of Education and at several other Training colleges). The evidence relied upon in estimating a candidate's merits is thus derived from three sources: first, the report of his secondary school; secondly, the impression produced by him during an interview - the interview itself being partly standardised in form; and, thirdly, the results of the psychological test. The test results are valuable, not only as an independent source of information about individual candidates, but also as a means of equating the varying standards of assessment implied in the reports from the different secondary schools. The estimates based independently upon each of these three methods will ultimately be compared with subsequent progress of the candidates in their academic and professional work. [page 40] 32. Both the Binet-Simon Scale, and the majority of the group tests in present use, are predominantly linguistic in character. For the most part they consist of verbal questions; and call for verbal answers. As a rule, this is an advantage; but in exceptional cases it may prove a serious drawback. In America, the large number of illiterates, and of alien children and adults who speak no English, gravely limits the use both of the Binet-Simon Scale and of the ordinary written tests. Occasionally, too, even in this country, the examining officer encounters a child who for various reasons is heavily handicapped for any test of a verbal type. Such a child, whether from nervousness or from some defect in hearing or speech, may be utterly unable to do himself justice in an examination consisting of nothing but oral questions and answers. Owing to the low culture of his home, or a lack of regular attendance at an ordinary school, he may possess not even that bare minimum of knowledge which the Binet-Simon Scale assumes; sometimes even an older and more intelligent child is unable to read, write, compute or recognise the common coins. The profound influence of these shortcomings has been clearly demonstrated in a recent research by Mr Hugh Gordon upon gypsy and canal boat children. (1) Such cases are exceptional, but they call for constant vigilance in routine examinations. To surmount this defect, several psychologists have sought to construct supplementary tests of a practical rather than a verbal character, known as Performance Tests. Instead of answering by word of mouth the child is required to do something, or to make something, with his hands. He is asked to fit together pieces of cardboard or wood so as to make certain shapes or pictures; to build up a large cube out of little cubes; to make (1) Mental and Scholastic Tests Among Retarded Children, Board of Education Reports, 1923. Mr Gordon's statement of the limitations of the Binet tests - based upon the American version - are quite consonant with the opinion generally held in this country. In the press, however, his statements have been carried beyond the cautious point to which he himself took them; and have been treated as criticism of the general use of such tests for ordinary purposes with children of ordinary schooling. For a reply to these too sweeping deductions, see the review by Dr PB Ballard, 'Mental and Scholastic Tests among Retarded Children', Forum I., iii. (1923), p.250. [page 41] various movements in imitation of the examiner; or to number geometrical figures according to a key. Nearly all such tests need special apparatus to be obtained from manufacturers of scientific materials. Of these Performance Tests the oldest example and the commonest type is the form-board devised by Edouard Seguin (1846). It consists of ten wooden blocks of various shapes - a square, a triangle, a circle, a star, or a Maltese cross; each block has to be fitted into a hole of similar design; the time and child's procedure is recorded. Seguin intended his form-board for training rather than testing; but in this country it has long been used as a test for the mentally defective. By altering the number, shapes, and subdivisions of the blocks, form-boards increasing in difficulty have been devised. A different type of non-linguistic test, used with success in an early research in England and since applied in various forms and upon an extensive scale in America, consists in the reconstruction of dissected pictures. This test, as we have seen, was suggested by Ebbinghaus' 'combination' theory of intelligence, embodied, however, in concrete instead of verbal material. In its simplest form, a suitable picture-postcard was cut up into eight or twelve small rectangles, and the child had to re-combine the fragments into a whole, with or without an intact copy before him - after the fashion of a jig-saw puzzle. (1) Yet another early test, also used in the first instance for testing the feebleminded, is the Age Scale of graded mazes devised by Porteus (1915). (2) A recent collection of fifteen of the better tests of the performance type has been made by Pintner and Paterson, who have drawn up standard instructions and published norms of achievement for children of various ages. (3) (1) 'Experimental Tests of Higher Mental Processes and their Relations to General Intelligence'. Journ. Exp. Ped,, 1911, I., ii., p. 102. For later work with tests of this type, see Pintner and Anderson, loc. cit. inf.
(2) J. Exp. Ped., (1915), III., ii., pp. 127-135. 'Motor Intellectual Tests for Mental Defectives'. The necessary materials will be found reprinted in A Handbook of Tests for Use in Schools (PS King & Son, 1923).
(3) A Scale of Performance Tests (D Appleton & Co., 1917). Another collection is described by Dearborn, 'Form-Board and Construction Tests of Mental Ability'. J. Educ. Psych., VII. (1916) pp. 445 et seq. cf. id. 'A Series of performance tests of Intelligence', Harvard Mon. Educ. I., iv., 1923. [page 42] Performance Tests have been employed far more widely in the United States than elsewhere. The chief uses to which they have been put are the following: (i) To test the intelligence of the deaf. It was, indeed, for the purpose of classifying school children who were deaf or hard of hearing that the first scale of performance tests was devised by Pintner and Paterson. (1) (ii) To test the intelligence of speech defectives. Persons who have an impediment, whether physical or nervous, in their speaking, generally fail to give a just impression of their powers in the customary verbal tests. (iii) To test the intelligence of foreigners who speak no English, and are often illiterate. Such tests are now extensively used with immigrants at Ellis Island, New York; and were employed for a similar purpose in the United States army during the war. (2) (iv) To test children and young persons, who, from lack of schooling, are unlikely to do themselves justice in verbal and literary tests. For this purpose performance tests have been largely used in testing delinquents, truants, defectives, and children from ignorant homes. (v) To test children and young persons, who, on the ground of special linguistic facility (so-called 'verbalists') (3), are likely to show with too great an advantage in conversational tests of the Binet-Simon type. (vi) To test the intelligence of young persons for vocational guidance, where practical rather than intellectual ability is chiefly required. Experiments with such tests have lately been carried out upon school children in London by an investigator from the psychological laboratory of Bedford College, London, and, independently, by a second investigator trained in America and (1) See Pintner and Paterson, loc. cit. sup.; also id., 'Learning Tests with Deaf Children', Psych. Rev. Mon., XX., iii., p. 87 (1915), cf. id., 'A Class Test with Deaf Children', J. Educ. Psych., VI. (1915), p. 591.
(2) Knox, HA, 'A Scale based on the Work at Ellis Island for estimating Mental Defect'. Journal of the American Medical Association, 1914. See also Yoakum and Yerkes, Army Mental Tests.
(3) See Healy, W, The Individual Delinquent, pp. 473 et seq. Also Healy, W & Fernald G, Tests for Practical Mental Classification (1911), where some of the first and best-known performance tests are described. [page 43] working under the Industrial Fatigue Research Board - a department of the Medical Research Council. The results (1) show that tests of this kind are well adapted for measuring the practical intelligence of children who with a more linguistic type of test fail to reveal their true ability, such as the canal boat children alluded to above. Further, these tests not only arouse immediate interest in young people who show a distaste for an examination of an ordinary scholastic type, but also incidentally bring to light many important qualities of temperament and personal outlook. It is clear, however, that both the methods and the results need further standardisation before they can be applied in this country with success. 33. In a history confined primarily to psychological testing the development of tests, educational attainments need only be touched upon very briefly. 34. The Standards of Former Codes of the Education Department. The conception of standard tests, based on age-performance and measuring educational capacity and attainment, is by no means a new idea, due wholly to recent psychological research. The Report of the Newcastle Commission on the state of popular education in England issued in 1861 a recommendation that a grant should be paid in respect of every child, who, having attended the Elementary School in the year preceding the day of examination, had passed an examination in Reading, Writing, and Arithmetic. (2) As a result of this recommendation, a provision was introduced by the Committee of Council on Education into the Revised Code for 1862, (3) stipulating that every scholar for whom Grants were claimed must be examined according to one of six standards (each briefly described by the Code) in the subjects specified. The ultimate result of this regulation was (1) To be published shortly.
(2) Report of Royal Commission on the state of popular education in England (1861), Vol. I., p. 545, Recommendation 6.
(3) Revised Code for 1862, Arts. 40, 46-48. The issue of the Revised Code for 1861, Arts. 43 and 44, had provided for four groups of children of the following ages, 3 to 7, 7 to 9, 9 to 11, 11 and upwards, with standardised ranges of attainment for each group in reading, writing and arithmetic. [page 44] the organisation of elementary schools on a basis of annual promotion. Each class in the senior department corresponded to an age group; and the whole series of classes were numbered standards I to VI, (1) roughly corresponding to the ages 7 to 12. The test cards set by the inspectors were in some ways strikingly similar to the group tests of scholastic attainments drawn up for successive years by psychologists of the present day. From the first there was much opposition to these arrangements. The objections of the teachers, however, were directed not so much against the method of testing as against the principle of 'payment by results'. The merits and defects of the standards then formulated, obvious as they now seem in the light of recent research, are not entirely uninstructive. These standards were based, not upon an experimental enquiry into what children of a given age actually knew, but upon an a priori notion of what they ought to know: they largely ignored the wide range of individual capacity; and the detailed formulations for the several ages were not always precise or appropriate. (2) In the course of thirty years the first strict conditions were gradually relaxed, more and more freedom of classification being given, the tests made more and more elastic, and the examinations being taken by sample only. From about 1892 the system of examination by standards began to fall into disuse; and it was finally abandoned by the Board of Education about the beginning of the present century except for a few special purposes, such as examining candidates for Labour certificates. 35. American Tests. Outside Great Britain the first application of the newer methods of scientific measurement directly to the results of school teaching was made in 1898 by an American, Dr JM Rice, in an investigation upon spelling. (3) The protests of American educational experts and teachers against stereotyped (1) A Seventh Standard was added about 1882.
(2) A conspicuous instance is the statement that the child should be able to spell words from the same books as he used for reading. Recent work shows very clearly that to spell a given word is much harder than to recognise it when spelt already in print. As a rule a child can read most words a year earlier than it can spell them. In general, almost the whole of the first requirements proved to be too hard and stringent by the equivalent of about a year; and later the original requirements for Standard I were fixed for Standard II, and so on.
(3) 'The Futility of the Spelling Grind', Forum, XXIII., pp. 163 - 172, and 409 - 419. [page 45] methods of examination for attainment were even more vehement than the earlier protests in England. But five years later a committee on school efficiency was appointed at the Philadelphia meeting of the Department of Superintendence; superintendents themselves were rapidly converted; and the custom of school surveys, including investigations of educational attainments as well as of attendance, costs and equipment, developed with great speed, if not always with equal prudence. The earliest set of educational tests, however, standardised upon lines resembling those adopted in the measurement of intelligence, emanated once more from France, and was the work of three Parisian psychologists. In conjunction with Monsieur V Vaney, Binet and Simon attempted, besides their scale for natural intelligence, what they styled a 'barometer of instruction' - a set of graded exercises in reading, spelling and arithmetic. (1) The tests were compiled upon a rough and ready plan; and were of necessity adapted only to French requirements. In America, under the enterprising lead of Professor EL Thorndike, a number of test scales have been issued from time to time during the last twelve years, based upon an elaborate statistical analysis, and dealing with the measurement of the various subjects of the elementary school curriculum. (2) Of these the earliest appears to have been Thorndike's scale for estimating quality of handwriting. (3) For this purpose, however, the scale which has been the more widely used is that constructed by Ayres, based, not like the Thorndike scale upon the general merit of the script, but simply upon its legibility. Since the early experiments of Rice, the efficiency of spelling has been the subject of several tests, notably by Ayres (4) and Buckingham. (5) Reading has been the subject of innumerable (1) The Development of Intelligence (1905), (Kite's translation) p. 70; see also Mentally Defective Children, (Drummond's translation) p. 54.
(2) A compact description of American tests of scholastic attainment will be found in Wilson & Hoke, How to Measure (Macmillan & Co., 1920). A reference to the bibliography in Appendix VII, however, will show that American publications on this subject are almost innumerable.
(3) Thorndike, 'Handwriting', Teachers College Record, March, 1910. Ayres, 'A Scale for Measuring the Quality of Handwriting of School Children'. Russell Sage Foundation, Bulletin No. 113.
(4) A Measuring Scale for Ability in Spelling.
(5) Spelling Ability: Its Measurement and Distribution, Teachers' College, Columbia. [page 46] tests. Among these perhaps the most notable are those of Thorndike, Kelley and Munroe. (1) In arithmetic the results of the long researches carried out by Courtis deserve especial mention. (2) For drawing and for composition scales once more have been drawn up by Thorndike. (3) More recently tests have been devised for history and for geography as well as for algebra, Latin, physics and chemistry. Apart from differences in idiom and in values for money, weights and measures, the chief disadvantage of the American scales lies in the custom of compiling averages and norms for school classes or 'grades' instead of for age groups. Such figures cannot, therefore, be adopted, as they stand, for English children or in English schools. 36. English Tests. In this country the pioneers have been Dr Ballard, Mr Winch, and the late Professor JA Green. Dr Ballard was early in the field with simple and effective tests of arithmetic (4) and reading; (5) he has lately collected a number of test scales, compiled by himself and others, in his books on Mental Tests and The New Examiner. Both as editor of the only periodical in this country dealing solely with educational psychology, (6) and as secretary of a Committee appointed by the British Association to enquire into Mental Factors in Education, Professor Green, together with his research students, was among the first to introduce and to investigate, not only the Binet Scale itself, but also experimental tests of particular school subjects, in their application to English schools. Mr Winch has employed original tests of arithmetic and other school subjects to investigate memory, fatigue, reasoning, and (1) Thorndike, EL, Improved Scales for Work Knowledge or Visual Vocabulary, Kelley, FJ, The Kansas Silent Reading Tests, Munroe, WS, Standardised Silent Reading Test.
(2) 'Measurement of Growth and Efficiency in Arithmetic', Elementary School Teacher, X, and later numbers.
(3) 'The Measurement of Achievement in Drawing'. Teachers College Record, 1913. 'Thorndike Extension of the Hillegas Scale for the Measurement of Quality in English Composition'. 1912.
(4) J. Exp. Ped., 1914, II. p. 396.
(5) J. Exp. Ped., 1915, III. p. 153.
(6) The Journal of Experimental Pedagogy, now renamed The Forum of Education, and edited by Professor Valentine, of Birmingham. [page 47] the transfer of training in children. (1) More recently, the London County Council has issued a set of standardised tests not only for native intelligence but also for attainments in the chief subjects of the elementary curriculum - such as reading, spelling, arithmetic, writing, drawing and composition; (2) and has published the results of an educational survey of a representative borough, carried out with the assistance of teachers by the application of such tests. (3) Of these later tests the majority are arranged upon an age basis: there are, for instance, for each successive year, ten words which the average child can read, ten words which he can spell, ten sums in mental, mechanical, and problem arithmetic which he can work, at the age assigned. It is then possible quickly and easily to compute an 'educational age' for every individual scholar in all the chief subjects of the elementary curriculum. (4) 37. The other important branch of applied psychological testing, namely, the testing of vocational aptitudes, bears upon the work of the schools only in an indirect fashion and in certain limited fields. Consequently, despite its rapid and remarkable expansion during the last five years, its history need here be but briefly recounted. By vocational psychology is understood the discovery and measurement, by scientific methods, of those special mental qualities in virtue of which a particular individual is naturally adapted for one occupation rather than for another. It is generally regarded as embracing two main divisions, termed respectively vocational guidance and vocational selection. (1) J. Educ. Psychol., 1910, I, i. and ii., 1916, VII. ii. Child Study, 1913 VI et seq., Brit. J. Psychol., 1911, IV, ii. 1914, VII. ii. J. Exp. Ped., 1913, II. ii, 1921 VI. iii. For the meaning of the phrase 'transfer of training', see above, Section 11, paragraph 2.
(2) Three Memoranda on Mental and Scholastic Tests, 1921. The test materials contained in this volume have been reissued in the form of a Handbook of Tests for Use in Schools.
(3) Three Preliminary Memoranda on the Distribution and Relation of Educational Abilities, 1917.
(4) For examples of such tests, see Appendix VIII. [page 48] Vocational guidance aims at finding the best occupation for a given person; vocational selection aims at finding the best person for a given occupation. The former falls within the scope of the teacher, so far as he has to prepare and recommend his pupils for whatever trade or profession may suit each best when due to leave the school; the latter falls within his scope, so far as he has to select particular boys or girls for trade schools, art schools, apprentice schools, central schools with an industrial or commercial bias, or similar institutions such as prepare their scholars for definite forms of employment. In neither instance are the aims of psychological guidance and selection wholly new. For many years the teacher who has taken a personal interest in the children under his charge has habitually aided them with advice upon the kinds of employment best suited to the special aptitudes of each. The numerous professional examinations - those which the lawyer, the doctor, the schoolmaster, the engineer, the accountant, the civil servant, must pass before they are qualified to take up their chosen work - are, in essence, vocational tests. And the well-known Board of Trade tests for colour-blindness are strictly psychological tests, worked out in the laboratory by psychological methods, for what is definitely a problem of vocational selection. But with one or two rare exceptions of this type, the tests and devices used until recently, both for selection and for guidance, have been eminently unscientific. In its scientific form, the history of vocational psychology is usually dated from two American experiments carried out during the first decade of the present century. 38. Vocational Guidance. The beginnings of psychological 'guidance' upon systematic and scientific lines are generally traced to the experiments of the late Mr Frank Parsons in Boston, USA. Parsons began by arranging conferences with all the boys of his neighbourhood who were to leave the elementary schools at the end of each year. His recommendations were based upon answers to a psychological questionnaire rather than upon performances in psychological tests. But out of these informal discussions there grew up a permanent office which was opened in 1908, where all Boston boys and girls were able to come for counsel and advice upon the choice of a vocation. (1) (1) Frank Parsons, Choosing a Vocation, Boston, 1909. [page 49] The Boston Vocation Bureau, thus established, rapidly stimulated a large number of American cities to come forward with similar plans. Educationists have been especially attracted by the movement. Most American high schools now have their vocational adviser; and a large number of educational systems have offices for vocational guidance. Tests of general intelligence and educational attainments are freely used; and tests for particular occupations have been introduced more recently, borrowed from the increasing array of specific trade tests worked out primarily by those engaged not in guidance but in selection. 39. Vocational Selection. The beginnings of psychological 'selection' are usually traced to an earlier experiment by Mr FW Taylor, an American engineer. (1) In the factory of a rolling machine company, in Massachusetts, Taylor endeavoured to select the best girls for the work of inspecting bicycle-balls [sic - presumably bells!] by means of a test of reaction time. Some of the girls, already engaged as inspectors, were found to show a slow reaction time, and were dismissed - although this involved the dismissal of 'many of the most intelligent, hardest working, and trustworthy girls'. The broad result was that, after selection, thirty-five girls did, in a shorter working day, as much work as one hundred and twenty had done before selection. Neither the experiment of Parsons nor that of Taylor, was carried out upon what would now be considered strictly scientific principles. Their work, however, aroused active interest; and their writings, and those of other 'efficiency engineers', stimulated Professor Hugo Munsterberg, at that time head of the psychological laboratory in Harvard University, to publish a systematic account of the possibilities of industrial psychology, dealing in special detail, among other topics, with test methods for selecting the most suitable work and the most suitable workmen. (2) Munsterberg himself carried out several experiments on vocational selection. He devised ingenious schemes of testing for the telephone service, for the electric railway service, and for navigating officers of a large ship's company. His oft-cited experiment on the selection of tram drivers appears to have (1) Principles of Scientific Management, (Harper Bros. 1911) The experiment was first described in 1903.
(2) Psychology and Efficiency, (Constable & Co.) 1913. [page 50] been the first real experiment in vocational selection by genuinely psychological tests. (1) 40. Vocational Testing during the Great War. Starting in this somewhat sporadic and haphazard way, the usefulness of vocational testing was developed and demonstrated during the war. In England, under the Air Board and under the Admiralty, tests were devised and carried out by laboratory psychologists for air pilots, aeronautical observers, hydrophone operators, and for many other military and naval tasks needing special capacities of sense perception or special degrees of intelligence. In America trade testing was carried out in even greater detail. Skilled or efficient men were required immediately for over four hundred separate occupations. There was no time to train them; or to try them out by a period of probationary engagement. Misfits might mean a grave disaster. Hence, various trade tests - based upon the usual principles, and of the oral, pictorial, 'performance', and 'written group' types - were rapidly and successfully devised. (2) The work was undertaken by a Committee of the USA War Department (Committee on Classification of Personnel) operating through an Army Trade Test Division established in three separate centres; and was largely guided in its beginnings by Professor EL Thorndike (who had already (1) Munsterberg's method of selecting the best telephone operators was based upon a collection of eight tests: (1) immediate memory; (2) logical memory; (3) speed of movement; (4) accuracy of movement; (5) speed of association; (6) spacial judgement; (7) card sorting; (8) cancelling certain letters from a page of print.
(2) See J Crosby Chapman, Trade Tests: The Scientific Measurement of Trade Proficiency (George Harrap & Co.) 1922. [page 51] done so much in establishing other branches of psychological testing) in conjunction with Colonel WD Scott and Dr WV Bingham. After the war the same methods were naturally continued by the larger firms, and by such bodies as the Carnegie Institute of Technology and the American Civil Service Commission. (1) The movement thus begun has rapidly spread. France, Belgium, Holland, Spain, Switzerland, Germany - indeed, most of the civilised countries of the world, now possess institutes for vocational guidance, in which trained psychologists, working in close contact, not only with business firms and psychological laboratories, but also with school teachers and education authorities, carry out vocational tests and offer vocational advice. 41. Vocational Guidance among English School Children. In this country two main bodies have carried out investigations upon vocational guidance and selection. The Industrial Fatigue Research Board, a branch of the Medical Research Council, was established during the war to carry out, by physiological and psychological methods, scientific enquiries upon industrial efficiency. It includes a psychological committee; and, since the war, has carried out several enquiries upon vocational selection; and has published a useful review of the literature of vocational guidance. (2) The National Institute of Industrial Psychology, founded in 1921, includes among its principal aims 'the more efficient guidance of children in taking up their life's work'; and has recently established a vocational section. Its investigators have already carried out a number of experiments upon selection in schools, offices, factories, and firms of different types; and have constructed and published group tests for estimating 'intelligence' in older children and younger adults, and specialised tests for estimating aptitude for particular professions and trades. (3) (1) See more particularly the Fortieth Annual Report of the US Civil Service Commission (1923), especially 'Report of Research Section', pp. li-xcix.
(2) Reports of the Industrial Fatigue Research Board, No, 12. 'Vocational Guidance' by Professor B Muscio. (HM Stationery Office, 1921), 1s. net.
(3) The results of these investigations are recorded from time to time in the Journal of the National Institute of Industrial Psychology. See specimen tests contained in Appendix VIII. [page 52] Under the Education (Choice of Employment) Act 1910, many local education authorities have drawn up schemes for placing children in suitable employment when they leave school. These recommendations are based largely upon the reports of the child's head teacher, usually entered upon a so-called 'school leaving form'. The report includes general observations as to the child's ability, character and conduct, a statement of the sort of employment which he desires and the teacher recommends, together with notes extracted from the records of the medical officer's last inspection. In London during the past eighteen months an investigation has been carried out in certain schools in a selected borough to determine how far these reports and recommendations can be made still more effective, by basing them upon a scheme of intensive psychological testing. A survey has been carried out of recent placements within the area chosen; the commonest occupations so secured have been analysed; psychological tests have been administered for general intelligence, for special attainments, and for the qualities needed for these commoner trades; and the children are being followed up to see how far the recommendations thus deduced have proved sound and beneficial. The experiment has been carried out as a joint research by investigators under the London County Council, the Industrial Fatigue Research Board, and the National Institute of Industrial Psychology. 42. Vocational Selection for Trade Schools and Apprentice Schools. The establishment of trade schools and similar institutions, and the award of trade scholarships, has made tests and examinations for trade aptitudes a definite element in the work of education authorities. As a rule, such examinations are partly based upon subjects of the traditional type (English - including history, geography and composition - and arithmetic); but also include more technical or practical subjects - such as drawing (freehand, geometrical, object, and nature drawing), woodwork or metalwork for boys, and cookery or needlework for girls, according to the type of scholarship to be awarded, and on these special weight may be laid. In London, by the initiative both of the Principals of three or four Trade Schools and polytechnics, and of investigators under the National Institute of Industrial Psychology, experiments have recently been made upon the possibility of adding to the entrance examinations vocational tests upon lines more definitely psychological. In the [page 53] apprentice-schools attached to two large engineering firms at Manchester, trained psychologists have also been enquiring into how far existing methods of selection can be improved by the introduction of more scientific modes of testing. With an equal measure of success, experiments have also been carried out with tests for the selection of dressmakers, shorthand typists, printers and various other trades. Several large English factories have appointed their own works' psychologists, whose duties are largely concerned with the selection of young people for their apprentice schools and workshops. The whole enquiry is still in an experimental stage, but positive results, of benefit to the firms and to their employees, to the trade schools and to their pupils, have already been obtained. (1) 43. Among the many factors that determine a child's educational progress, temperament and will are not less important then intelligence or knowledge. Accordingly, in the study of individual pupils, it is always desirable, though by no means always possible, to form an opinion upon the moral and emotional qualities of the child, as well as to test his intellectual capacities and attainments. This, indeed, is the very latest field into which psychological testing has penetrated. 44. The Testing of Neurotic and Delinquent Children. The analysis of character has been found particularly needful with those who suffer from moral or emotional disabilities, namely the delinquent and the nervous or neurotic. Of late years there (1) A brief account of the history and aims of vocational guidance will be found in Professor Claparede's little pamphlet on Problems and Methods of Vocational Guidance, (International Labour Office, Geneva, 1922); and some concrete suggestions on the possibility of vocational diagnosis in schools and among children will be found in the Lectures on Industrial Administration, edited by Professor Muscio (Pitman & Sons, 1920.) [page 54] has been an increasing tendency, both in this country and abroad, for neurotic children and juvenile delinquents to be referred for examination to a psychologist or a psychological clinic, wherever the facilities exist; (1) and, in all such cases, the measurement of intelligence and the study of temperament must go hand in hand. Neither is adequate without the other. 45. The Influence of Emotional and Moral factors on the Testing of Intelligence. But even with the normal, healthy child, emotional and moral qualities are apt to disturb the results of psychological testing in whatever form. In a group test of intelligence the lazy child may fail to exert his utmost powers. In an oral examination, as in the Binet tests, the shy or timid child may become suddenly confused, or nervously apprehensive, or altogether paralysed and mute. Hence, the testing psychologist must remain ever alert, lest, when he thinks he is testing intelligence, his final results may be vitiated by the irrelevant intrusion of excitement or emotion. And to overcome or counteract these tendencies numerous rules of procedure and devices of technique have of late been progressively contrived. (2) 46. Tests of Temperament and Emotion. Increased efforts have recently been made to measure temperament and character by methods more direct and exact, by tests similar to those which have been used for the precise estimation of 'intelligence'. But, although the idea of temperamental testing is almost as old (1) In Birmingham two psychological experts to assist the Justices have recently been appointed, one for adults and one for children. In London, teachers, magistrates, and organisers of children's care may refer neurotic or delinquent children for examination by the Council's psychologist; and an increasing number of such cases are now dealt with by such means. The Glasgow Educational Authority, as well as that of London, has now a psychologist attached to its staff, whose duties are primarily concerned with the testing of subnormal children (particularly defectives) and with the training of teachers in suitable methods. In America most large cities have their psychological clinic attached to the Courts, the universities, or the local administrative school authority. See Appendix III.
(2) A suggestive paper, dealing with some of these recurrent difficulties, is that of Augusta Bronner, on 'Attitude as it affects the Performance of Tests'. Psych. Rev., 1916, XXIII, iv. [page 55] as that of intellectual testing, it has had quite a different career. (1) Early observations and experiments upon the estimation of character and temperament were carried out by both Galton and Binet. Binet, in particular, besides his scale for intelligence, carried out many ingenious researches upon the measurement of conscientiousness, suggestibility, and fidelity of description and report. (2) But the first to approach the problem with an adequate statistical procedure was Dr Naomi Norsworthy. The procedure itself was borrowed from that used by Professor Cattell in a biographical study of American men of science. From her analysis of the various estimates of a teacher's personal character - estimates made independently by half a dozen students - Miss Norsworthy concluded that 'it would seem possible, by the use of some such method as the foregoing' (measurement by relative position or ranking) 'carried out on a much wider scale, to justify a list of character traits, numerical estimates of which by competent people would be both reliable and significant'. (3) Of all methods of investigating emotional tendencies by experimental methods, the oldest, the best known and the most widely used are the methods of associative reaction and of the so-called psychogalvanic reflex. The two are frequently employed in conjunction. There is a story which tells how an amateur detective, travelling one day by train in a full compartment, offered to guess the professions of his fellow passengers if each would give him, first of all, two meanings for a single word. The word was 'box'. Everyone began by saying that a box was a receptacle, of some firm material, and almost any size and shape - or some such phrase. But their second definitions differed altogether; and, according to the tale, at once disclosed some personal or professional interest. A lady, whose dress and complexion alone betrayed her calling, said a box was 'the best place in a theatre'. (1) A good summary of the literature, with a detailed bibliography, will be found in Cady's article on 'The Psychology and Pathology of Personality: A Summary of Test Problems', J. Delinq., VII, 225 (1922).
(2) L'Etude Experimentale de l'Intelligence, Paris, 1903.
(3) 'The Validity of Judgements of Character'. Essays in Honour of William James. 1908, pp. 553-567. [page 56] The postman said it was 'something you received at Christmas'. The schoolboy thought it was 'something you received upon the ear'. A pugilist said it meant 'fighting with the fist'. A baseball enthusiast explained that it was 'the square where the pitcher stood'. A soldier described a sentry-box; a Scottish landowner a shooting box; and an engineer started talking of gear-boxes and axle-boxes. It is upon this principle - the varying directions taken by our spontaneous processes of thought - that the test of the associative reaction primarily depends. In its scientific form the experiment was first suggested by Galton fifty years ago; in his own case he found it gave a surprising insight into the ideas and contents of his mind. (1) A test of free association, of the continuous type, was later inserted by Binet into his scale of tests for intelligence; Binet asks the child to say in succession as many disconnected words as he can, in the space of three minutes. Later still, in the present century, the possibilities of such a test for detecting more emotional interests, whether conscious or unconscious, and particularly for discovering so-called repressed complexes, were developed by the psychoanalytic school, most of all by Dr CG Jung and his pupils. (2) The method devised by these last investigators, and now in current use, is roughly as follows. A list of words is first prepared, the majority calculated to arouse some emotional experience or memory; the words are called out, one by one, to the subject, and he is required to answer as quickly as possible with the first word that comes into his mind: e.g. 'Box?' 'Theatre'; 'Paper?' 'Fire'; 'Wish?' 'Drink'; 'Fear?' 'Mouse'; 'Teacher?' 'Preacher'; and so on. His answers are then compared with a standard collection already compiled from a thousand tested persons; the comparative frequency of each reply is noted; and the time each takes is measured, in fractions of a second, with a stopwatch or a chronoscope. The so-called psycho-galvanic reaction was discovered accidentally by a Swiss engineer, EK Miiller; and the experiment (1) Inquiries into Human Faculty (1883) pp. 133-146. 'Psychometric experiments'.
(2) Studies in Word Association. (1918: a reprinted collection of earlier papers.) [page 57] was studied psychologically by Jung and his colleagues at Zurich. (1) In this country the method was first tested and confirmed in the psychological laboratory at Liverpool University; and has since been the subject of numerous researches. (2) The discoverer, who happened to be holding the wires of a galvanometer, noticed the curious fact that, whenever he experienced an emotion, the resistance offered by his body to the passage of the current seemed to be momentarily lowered. It was afterwards found that, if electrodes be fastened to a person's hand, the deflections of the galvanometer show instantly whether his thoughts - excited it may be either by an inward reflection or by an outward stimulus such as a name or a pistol shot - are taking an emotional turn. With one and the same person, the degree of the deflection corresponds roughly to the intensity of his feeling; hence his chief emotional interests can easily be explored. Between different persons, comparisons are more difficult to draw, but, after certain allowances have been made for differences in their body resistance, the intensity or frequency of the deflections (as the most recent experiments suggest) will probably provide us, after further research, with one of the most trustworthy methods for measuring emotional susceptibility and temperament. (1) Jung, Loc. cit., chapter XII. So early as 1888, however, the effect of psychical processes on resistance to electrical conduction had been studied by French scientists (Fere, Comptes rendus de la Societe de Biologie, 1888, p. 217 et seq.; Vigouroux, Le Progres Medical, 1888, sem. i, pp. 45 and 86). The Swiss investigations appear to have started from an independent discovery of the phenomenon, as described above.
(2) The fullest account in English is to be found in Whately Smith's book The Measurement of Emotion, 1922. The most careful research carried out in this country is that of Dr E Prideaux, who investigated the method in the Cambridge Psychological Laboratory for the Medical Research Council ('Expression of Emotion as shown by the Psycho-Galvanic Reflex', Brit. J. Mod. Psych,, II., i., (1921) pp. 23-46). [page 58] 47. Tests of Character and Morality. Moral tests have also been investigated by similar means. (1) In this country, however, the few who have investigated the possibility of tests for character have preferred an indirect to a direct technique. The moral test is, as it were, disguised as of a test of intelligence or information. A device, full of possibilities in this direction, is the optional question paper. Every teacher knows how, in an examination on languages or mathematics, the routine worker chooses the mechanical question, while the more enterprising select the problems and the riders; the cautious keep to the prepared texts, while the adventurous prefer the unseen translations. Mr Frank Watts (2) among others has endeavoured to construct question papers of the optional type, calculated to bring out such temperamental differences by the latitude allowed. More recently experiments have been made with pictorial instead of verbal matter: a set of pictures, artistic, humorous, or informative, are placed before the child (picture postcards supply almost unlimited material for such purposes); and he is asked to arrange them in order of preference or merit; the (1) One or two American investigators, for example, have attempted to measure ethical discrimination by getting children to arrange a list of offences in order of wickedness. (Fernald, Amer. J. Insanity, Ixviii., 547; Haines, Psychol. Rev., xxii., 303). Others have attempted to measure moral judgement by noting how often the child singles out moral reasons for certain actions, in preference to reasons of a general or a personal character. (SC Liao, Educ. Contr. Brown Univ., III., 1919. cf. Kohs, J. Delinq., VII., i. (1922) 'An Ethical Discrimination Test'.) Yet another, following up a suggestion of Binet, has elaborated group tests for honesty and conscientiousness; the children are required to trace mazes with their eyes shut; to fill up and correct completion tests with the key temptingly handy on the back; to state how much they know of various topics, with the prospect of earning a box of confectionery if they obtain full marks; the measure in these tests is the number of times the child yields to the temptation to cheat or to overstate. (Voelker, 'The Functions of Ideals and Attitudes'. Col. Univ. Contrib. Ed., 1921; Cady. 'The Estimation of Juvenile Incorrigibility', Journ. Delinq. Mon., 1923). In Cady's experiments - the most recent and the most thorough - the results of the tests described correlate with independent estimates of moral character up to .42). Sometimes (as in the last research) the examinee is also given a syllabus of questions relating to his own character: 'What kind of amusements do you prefer? Do you get on well with teachers and with other children? Would you like to wear jewellery and fine clothes? What do you think about when you are alone? What would you do if a lot of money were left you?'
(2) British Journal of Psychology, XI. 2. [page 59] influence of special interests, working quite unconsciously if the pictures have been chosen with care, is nearly always evident; and, if a standard order has been previously obtained, the child's divergences can be measured numerically. (1) 48. The Importance of Observational Methods as Distinguished from Experimental. Few, however, would as yet pretend that such tests can merit anything more than an experimental interest; and, in their present state, the methods are unsuited for practical employment in this country. (2) In assessing temperament and character, therefore, we are bound to fall back upon the method of observation in place of the method of experiment. The personal interview is one recognised device; and another is the collation of reports submitted by competent observers who have been acquainted with the examinee during a long portion of his life. Both interviewing and reporting has each its own technique; and in either case the technique is susceptible of great improvement by the application of simple scientific principles. Much, indeed, has recently been done by drawing up schedules of facts to be noted and observed, (3) and by contriving rating scales (4) for the registration of such facts in terms of a comparable scheme. More still, no doubt, is to be learnt from the practice of the expert alienist, from the methods and devices employed by the saner psychoanalysts in examining patients who are neurotic, delinquent, or temperamentally deranged. Nowhere is the art of interviewing carried out with such refinement and success (1) An interesting but tentative set of 'Tests of Aesthetic Appreciation' has been described by Thorndike in an article with that title, J. Educ. Psychol., VII (1916) pp. 509 et seq.
(2) See the Committee's comments on these tests in Chapter II, Section 64, and Chapter III, Section 90.
(3) Of these, perhaps the most suggestive are those given by Webb, 'Character and Intelligence', Brit. J. Psych. Mon., i., and Hoch and Amsden, 'Guide to the Descriptive Study of Personality', Rev. Neur. Psych., xi., 577, cf. Psych. Rev., xxi., 295.
(4) On rating persons either by 'relative position 'or by reference to 'key subjects' (a method elaborated with some success by the psychologists of the American army) a rich literature has grown up. See, among other references, The Personnel System of the US Army, vols. i and ii; Scott, Psych. Bull., xv. (1918): Thorndike J. Appl. Psych., ii and iv. (1918 and 1920); and Rugg, J. Educ. Psych., xii. and xiii. (1921 and 1922). [page 60] as in the consulting room of a good psychiatrist. But, when all is said, the problems of temperament and character still constitute one of the most difficult and urgent provinces for future psychological research. 49. These, then, are the numerous fields of application, into which, first in this direction, then in that, the new psychological methods have progressively spread. And this history, brief and cursory as it is, leaves two things indisputably clear. It shows that the science and practice of mental testing have successfully survived the first critical period of development; but that as yet they have by no means approached perfection or realised to the full their manifest possibilities. The tests and their uses are all still tentative, still experimental. Many of the expectations raised by the first enthusiasts are yet unfulfilled. Partly from the hasty and crude methods which from time to time have been adopted, partly because the whole problem has proved far more intricate than was at the outset assumed, the need for caution has become increasingly evident. On the other hand, the negative conclusions and the pessimistic verdict, announced by early sceptics towards the close of the nineteenth century, have not been confirmed by the further researches of numerous investigators in the twentieth. In the whole range of education the art of examining is probably the most difficult; and it was not until recent times that due attention was accorded to its proper principles. Gradually, however, as will have been remarked, the late experiments have become more limited. They have inclined to concentrate chiefly upon certain of the more general capacities, few in number and restricted in kind. The rest have been passed over. Until a few years ago, the majority of the work during the present century was, for the most part, directed to the measurement of one particular ability - intelligence. For the more specific intellectual functions - those underlying memory, attention, and the like - for the important qualities of temperament, emotion and mental character, and for the native aptitudes for particular occupations (as distinguished from attainments due to practice, training or experience) the few [page 61] relevant researches have yielded as yet hardly a single test method at once simple and trustworthy. But, with all these necessary reservations, the success and the widespread use of intelligence tests remain among the most remarkable achievements of modern experimental psychology. Finally, as regards general procedure, one striking fact emerges. Starting first of all with technical methods remote from the activities of everyday life, the psychologist has slowly approximated, so at least it must seem to the practical teacher, towards the traditional methods of the school and university - to the method of the oral interview and the method of the written examination. Yet this is no mere reversion; and the years of labour with tests now discarded have been by no means fruitless. From his laboratory the psychologist has brought with him something of value - a scientific technique capable of being rendered increasingly efficient and exact. His test questions must now be all carefully standardised beforehand; and he will rely, in evaluating his results, upon statistically elaborated norms. He thus endeavours to make both his procedure and his deductions as simple in practice, as sound in theory, and as precise in point of scientific method, as the complexity of his task and the limitations of his knowledge will for the time being allow. |