www.dg.dial.pipex.com712 readers since 14 Aug 2006 

Hadow (1924)

Notes on the text
Preliminary pages Preface, Contents, Membership, Analysis, Introduction
Chapter 1 History of development of psychological tests
Chapter 2 Summary of evidence
Chapter 3 Possible applications, conclusions and recommendations
Appendix I List of witnesses
Appendix II Recent experiments
Appendix III Use of tests in foreign countries
Appendix IV Standardisation and norms (Cyril Burt)
Appendix V Correlation in mental testing (Cyril Burt)
Appendix VI Grades in US schools (AE Twentyman)
Appendix VII Recent publications
Appendix VIII Examples of tests
Appendix IX General, special and group abilities
Index

The Hadow Report (1924)
Psychological tests of educable capacity and their possible use in the public system of education

Appendix IV

NOTE BY Dr CYRIL BURT ON STANDARDISATION AND NORMS
[pages 183 - 185]

The work of 'standardisation' as applied to psychological tests has two sub-divisions: (i) standardisation of method, (ii) standardisation of results.

(i) Standardisation of method. It is a fundamental principle of scientific testing that for purposes of exact comparison the method employed must be the same for all examinees, for all examiners, and for all different occasions. This requires not only that the material (sums, problems, sentences, apparatus, pictures) should be essentially the same in all cases, but that the questions should always be phrased in the same terms. Teachers unfamiliar with the technique of psychological testing sometimes modify their questions freely when they repeat them, in the hope of rendering them more intelligible to the child: an essential feature of psychological method is that the formulae are practically invariable. Indeed, Binet expressly states that the novelty of his scale consists not in the nature of the questions, but in the fact that they are always set in the same way. (1)

Here, however, one reservation must be made. The similarity of conditions required must be not objective, but subjective. In dealing with children of different temperaments the examiner's attitude must be correspondingly modified or adapted so that the conditions may really, so far as possible, be the same for all. If the same test is repeated time after time with the same children, it is clear that the content should be varied so as to minimise the effect of use and familiarity. For this purpose it is therefore desirable to obtain alternative forms of equal difficulty, e.g. in the 'Opposites' test, where it is perfectly practicable to draw up half a dozen different sets, each containing fifty problems and each being of the same difficulty as the others. (2)

(ii) Standardisation of results. For most purposes it is necessary to compare the examinees not only among themselves, but in relation to a fixed standard. If this fixed standard represents an ideal towards which the examinee's work is expected to approximate, but which it is not expected to attain, it is a standard for achievement. If it represents the actual level of a normal class or group, from which there can be divergences above and below, it is a standard of achievement. The former of these are of more interest to teachers, who prefer to watch the progress of their pupils towards an external ideal; the latter are of more interest to psychologists, who are more concerned with facts than with aims. At a meeting of the (American) National Association of Directors of Educational Research it was recommended that standards for achievement should retain the name of standards, and that those of achievement should be distinctively called norms.

Norms. By norms, in this sense, are meant specimens of work which represent the commonest type of achievement for the whole group in question. They constitute the means by which can be measured the degrees of abnormality shown by examinees above and below the normal. Hence they require at least two criteria: (1) a measure of the central tendency or average, (2) a measure of deviation.

The central or normal tendency for any given group is best measured by the average or arithmetical mean, which usually coincides with the mid-point between the extremes of abnormality - a 50 mark between zero and 100. It is most easily estimated when the work can be quantitatively marked. When the measurements are not in the form of quantitative marks, but are based on specimens or samples, e.g. in drawing or handwriting, the best way of identifying it is by using a 'median' or middle specimen.

To measure deviation a standard is required by reference to which the examiner can say not only that a given child is above or below the average, but by how much he exceeds or falls short of it. Binet, for example, measures the average in terms of mental ages, and the deviation from that average in terms of backwardness or advancement of one or more mental years. This is perhaps the simplest of all measures of deviation; others which are also in use will be found described in the numerous handbooks dealing with the application of statistical methods to education and pyschology. (3)

Footnotes

(1) Binet & Simon, Article on 'Le Developpement de l'Intelligence', in L'Annee Psychologique (1908). p. 60.

(2) A test is commonly said to be 'reliable' if it agrees consistently with itself, i.e. if, when applied on successive occasions it yields approximately the same result with the same group. It is said to be 'efficient' if it agrees with some external criterion accepted as valid, e.g. the estimate of a competent and conscientious observer. The measure of agreement (usually expressed by a coefficient of correlation) shows the success with which the test has been standardised.

(3) Useful references both on the standardisation of tests and upon the definition and use of norms are Whipple's Handbook, 2nd ed., first three chapters, and McCall's How to Measure in Education, Part II, 'How to Construct and Standardise Tests.' (see Chapter 11 for ' Determination of Reliability and Norms'.)

Appendix III | Appendix V