Editorial
The Hong Kong Practitioner VOLUME 29 / May 2007

Assessment - We don't like it but we cannot do without it
Cindy L K Lam 林露娟

One has to go through no less than 100 formative or summative assessments from primary school to university graduation in order to qualify as a doctor. Added to these are the Conjoint Fellowship Examination and Exit Examination in order to become a specialist in family medicine. Although we are, as described by the Chinese saying "身經百戰 (has experienced 100 wars)", experts in assessments, we still find them stressful and some may even find them demoralizing. Assessment is one of those things in life, like taxation, that we don't like but we cannot do without.

Assessment purpose

Assessment is an indispensable part of education and training to ensure that a candidate has achieved the required standard for the award of a qualification, promotion, progress to the next stage of training, professional development or quality assurance. Summative assessment that determines a pass or fail against a pre-set standard is the commonest method used in statutory or professional qualifying examinations. It is the best available method to assure that the minimum required level of competence has been reached. Formative assessment uses repeated measurements over a period of time, which is more suitable for monitoring the progress of training and for appraisal.

Ideally professional assessments should assess the performance of the candidate, which is the highest level of competence, as described in Miller's Pyramid of Competence.1 The foundation of the pyramid is knowledge (know what), on which one builds skills (know how) and acquires the ability (show how), and the ultimate outcome is performance (put to practice). Different methods have to be used to assess different levels of competence. The widely used multiple choice questions mainly tests knowledge, while the objective structure clinical examination (OSCE) is a popular method to test skills and ability.2 The assessment of performance requires direct observation of practice is not as commonly done as it should be because it is very challenging to organize and standardize. Recent developments in the methods of practice-based assessment have enabled performance assessment to be done validly, objectively and reliably. The Leicester Assessment Package of consultations3 used in the Exit Examination of the Hong Kong College of Family Physicians (HKCFP) is an example of such a method.

Irrespective of the method, an assessment must be valid, reliable, fair and discriminatory in order to serve its purpose.

Assessment validity and reliability

Validity means the test measures what it intends to measure. Competence, like beauty, is a latent variable that cannot be directly measured. It can only be assessed indirectly through the use of correlated indicators that are observable. These indicators become the content (questions) of an assessment. In order to serve the purpose, the indicators should be important, relevant, representative and adequate of the target competence being tested for. Reliability is a measure of consistency and reproducibility, which is a function of sample size in that the larger the sample of domains, questions, examiners, and methods, the more reliable are the results.4 It is important to note that reliability is an essential but insufficient requirement of a test. A very reliable test that is not valid is completely useless. For example, multiple choice questions are very reliable because they include a large sample but they have low validity as tests of skills and performance.

Assessment criteria and standards

An assessment needs to be fair in that all candidates with similar competence should achieve similar outcomes. This is easy with multiple choice questions for which model answers can be set, but it is not as straight forward for other assessment methods. Criteria that define the desirable attributes or behaviour are required for OSCE and other tests on skills and attitudes. Generic criteria that are independent of the case content are needed for practice-based assessment.

The ultimate goal of a professional qualifying assessment is to discriminate candidates who have passed the standard (required level of competence) from those who have not. The standard for a professional assessment should be absolute and based on pre-determined criteria, such as those used in the HKCFP Conjoint Fellowship and Exit Examinations. For example, the passing threshold for the HKCFP Exit Examination is 65% that is defined as "consistently demonstrates capability in almost all components to a high standard and a satisfactory standard in all". Some assessments, e.g. Hong Kong Certificate of Secondary Education Examination, use relative standards with a fixed passing rate and the outcome of a candidate depends on the standard of others. This is not appropriate for professional qualifying examinations because the competence level required of a professional should be absolute rather than normative.

One challenge with absolute standard setting is the effect of the level of challenge of the examination questions on the scores that candidates can achieve. Different methods have been developed to adjust for this.6 One of them is the Contrasting Groups method in which the examiner gives a global rating on the pass level for the candidate in addition to the score against the assessment criteria for the question.6 The cut-off score of each question is then determined statistically from the distribution of the scores of borderline candidates. This method is used in the OSCE of the HKCFP Conjoint Fellowship Examination to ensure fairness of the assessment.

Conclusions

Professional competence is multi-dimensional; it is not possible to have one single test that can validly and reliably measure all the required indicators. Multiple methods have to be used to determine whether a candidate has achieved the required level of competence. An adequate number of questions and examiners should be used to improve the reliability. Clearly defined assessment criteria are essential to ensure fairness. Standards should be set at an absolute level to assure the quality of the profession.

Assessment is a complex form of measurement. All measurements reflect not only the true result (the candidate's competence) but also variations that are related to the assessment content and context, examiners and other unknown factors. Research has shown that candidate factors explain no more than 50% of the variance in the results.7 No assessment is perfect; we can only try to make it as good as possible through a rational choice of methods and questions, fair setting of criteria and standards, and adequate examiner training.



HK Pract 2007;29:177-179

Cindy L K Lam, MBBS (HK), MD (HK), FRCGP, FHKAM (Family Medicine),
Clinical Associate Professor,

Family Medicine Unit, The University of Hong Kong.

Correspondence to :
Dr Cindy L K Lam, 3rd Floor, Ap Lei Chau Clinic, 161 Main Street, Ap Lei Chau, Hong Kong.



References
  1. Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990;65(Suppl):S63-S67.

  2. Newble D. Techniques for measuring clinical competence: objective structured clinical examinations. Med Educ 2004;38:199-203.

  3. Fraser RC, McKinley RK, Mulholland H. Assessment of consultation competence in general practice: the Leicester Assessment Package in Approaches to the assessment of clinical competence. Part 1. In: Harden RM, Hart IR, Mulholland H (ed). Dundee: Centre for Medical Education. 1992:192-198.

  4. Downing SM. Reliability: on the reproducibility of assessment data. Med Educ 2004;38:1006-1012.

  5. Lau JKC, Fraser RC, Lam CLK. Establishing the content validity in Hong Kong of the prioritized criteria of consultation competence in the Leicester Assessment Package (LAP). HK Pract 2003;25:596-602.

  6. Norcini JJ. Setting standards on educational tests. Med Educ 2003;37:464-469.

  7. Van der Vleuten CPM. The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education 1996;1:41-67.