Methodology(click here for brief version)
 


The Canadian Task Force on the Periodic Health Examination uses a standardized methodology for evaluating the effectiveness of preventive health care interventions and for developing clinical practice guidelines based on the evidence from published medical research. This chapter reviews the process used by the Task Force to develop guidelines and introduces concepts of clinical epidemiology and statistics involved in the reviews that follow.

The periodic health examination includes a group of activities designed either to determine a person's risk of developing disease at a later date or to identify early, asymptomatic disease. It encompasses both primary and secondary prevention activities. The aim of primary prevention is to prevent the occurrence of disease through immunization or by reducing exposure to risk factors or modifying behaviours; the aim of secondary prevention is to identify asymptomatic individuals with early stage disease when such early identification promises a significantly better response to treatment than in those who first present with symptoms.

With its inception in 1976, the Canadian Task Force on the Periodic Health Examination adopted a plan to use explicit analytic criteria to guide its evaluation of effectiveness.<2> The rules were refined in collaboration with the U.S. Preventive Services Task Force in the 1980s,<3-6> but the basic premise of forming recommendations of graded strength based on the quality of published medical evidence remains unaltered. The greatest weight has been placed on the features of study design and analysis that tend to eliminate or minimize biased results. Table 1 provides a summary of classification of recommendations, while Table 2 provides grades of published evidence . The Task Force strives to provide a bridge between research findings and clinical preventive practice. When research does not provide clear guidance, this lack of evidence is articulated. A major objective is to help physicians choose tests, counselling strategies or other preventive interventions of proven utility and avoid those that lack demonstrated value. For example, the performance of a routine electrocardiogram in an asymptomatic individual may work to the patient's disadvantage by consuming time that could be devoted to considerably more effective interventions for preventing heart disease, such as counselling regarding smoking, dietary fat intake or exercise. Of course, the physician’s knowledge of an individual will dynamically affect clinical decision-making. Further, many important factors that influence the effectiveness of clinical preventive services, such as the benefits of a healthy, caring patient-physician relationship, are not captured by traditional research methods. However, this text uses a clinical epidemiology perspective to summarize what has proven to be effective in primary and secondary prevention, what is known not to work or to work less effectively and what is not known. Unanswered questions for each topic evolve logically into research priorities.

Table 1. Grades of Recommendations
 
A
Good evidence to support the recommendation that the condition be specifically considered in a PHE.
B
Fair evidence to support the recommendation that the condition be specifically considered in a PHE.
C
Poor evidence regarding inclusion or exclusion of a condition in a PHE, but recommendations may be made on other grounds.
D
Fair evidence to support the recommendation that the condition be specifically excluded from consideration in a PHE.
E
Good evidence to support the recommendation that the condition be specifically excluded from consideration in a PHE. 

Table 2. Quality of Published Evidence
 
I
Evidence from at least 1 properly randomized controlled trial (RCT).
II-1
Evidence from well-designed controlled trials without randomization.
II-2
Evidence from well-designed cohort or case-control analytic studies, preferably from more than 1 centre or research group.
II-3
Evidence from comparisons between times or places with or without the intervention. Dramatic results in uncontrolled experiments could also be included here.
III
Opinions of respected authorities, based on clinical experience, descriptive studies or reports of expert committees. 

The analytic process utilized by the Task Force involves four major aspects. They are:


Defining Criteria of Effectiveness

Of fundamental importance to effectiveness is whether performing the proposed maneuver is likely to result in more good than harm. Good and harm should be considered broadly. They extend beyond the ability of a maneuver to reduce the incidence or severity of its target condition and include its other effects. As an example, the use of aspirin by asymptomatic men at risk for coronary artery disease might be viewed as effective if it reduced the incidence of myocardial infarction.<7> If, however, long-term aspirin use also increased hemorrhagic complications, the morbidity and mortality associated with non-target conditions (i.e. bleeding) might outweigh the health benefit of reduced coronary artery disease.

At the beginning of the analytic process it is important to lay out a comprehensive list of potential benefits and risks of a maneuver and to adopt explicit analytic methods to ensure that each category of outcomes is evaluated adequately. The smallest size of benefit or risk that is clinically (as opposed to statistically) significant also requires clarification.

The strongest evidence that a preventive service is beneficial comes from well-designed studies with adequate follow-up that demonstrate that persons who receive the clinical action experience a significantly better overall clinical outcome than those who do not. Unfortunately, there are few such studies to draw upon. Most evaluative studies have examined the effects of prevention on an intermediate outcome. For instance, studies demonstrate the effectiveness of medication in the control of intraocular pressure but not the effect of therapy on the progression of glaucoma.<6,8-9> The analyst must infer (from epidemiologic evidence or separate intervention studies) that an effect on the intermediate outcome will lead to an effect on the target condition – an inference that may not be borne out in many cases.

A useful tool for mapping out the relationship between clinical events, proposed by Battista and Fletcher,<10> is the "causal pathway" to illustrate the sequence of events that must occur for a given maneuver to influence a target condition. For example, the causal pathway for the early detection of hypertension (Fig. 1) illustrates that the most direct evidence of benefit would come from causal link No. 5, studies demonstrating that asymptomatic individuals in whom blood pressure is measured (and then treated) are less likely to suffer the complications of hypertension, such as stroke. In the absence of such evidence it is often possible to infer effectiveness by combining causal links Nos. 1 and 4, or links Nos. 1, 2, and 3.

The causal pathway provides a visual summary of the type of evidence that should be reviewed. The causal pathway for screening tests clarifies the need to evaluate two causal links to infer effectiveness: 1) the ability of the early detection procedure to identify the target condition; and 2) the ability of a treatment intervention to achieve a favourable outcome. As evaluation of screening tests has been a major component of Task Force work, it will be discussed in more detail before turning to the review of evidence. Screening is used primarily in reference to case-finding, i.e. the detection of disorders at an asymptomatic stage in individuals who are being seen in the office or clinic for other reasons.

First, the ability of a test to detect early-stage disease requires examination of sensitivity, the proportion of persons with the condition who are correctly identified by the screening test, and specificity, the proportion of persons without the condition who correctly test negative. A test with inadequate sensitivity means a significant proportion of persons with the disorder will escape detection. For any given sensitivity and specificity, the likelihood that a positive test result indicates disease, depends on the prevalence of the disease in the population of interest. If a disease is rare, the chance of a false positive result increases. Therefore, it is important to determine the positive and negative predictive values of the test in the population to be screened (the proportion of true positives among the "positive" test results and the proportion of true negatives among the "negative" test results, respectively). For this reason, it is also at times appropriate to screen populations with a higher prevalence of disease (high-risk groups) but not to screen the general population. When prevalence of a condition is high (as in the high-risk population), positive test results are more likely to be accurate.

Persons who are informed of false positive results may experience unnecessary anxiety until the error is corrected.<11> False positive results also lead to unnecessary diagnostic work-up, interventions or treatment. This is more of a problem in a relatively healthy population than false negative results but the latter may also lead to a false sense of security, resulting in inadequate attention to risk reduction and delays in seeking medical care when warning symptoms become present.

The second requirement to prove the value of screening, is to demonstrate the added value of early detection – to prove that asymptomatic persons with early-stage disease have a significantly better response to treatment than those who first present with symptoms. A study of appropriate design can show this. However, inferring that this is so based on studies showing better prognosis for individuals treated with early as opposed to late stage disease (particularly those not diagnosed through screening), or only for individuals in a high-risk group, weakens the evidence for screening asymptomatic persons considerably.

Even if all available evidence from experimental studies suggests that a preventive service will achieve a favourable outcome, the procedure may fail to achieve the same beneficial effects under the less controlled conditions of day-to-day clinical practice. Thus, effectiveness may differ from efficacy due to factors related to: 1) the patient population and in particular their compliance, 2) the providers offering care (general practitioners as opposed to researchers with special expertise and a standardized protocol), 3) financial limitations, and 4) logistic limitations of the health care system as a whole.

Beyond discomfort and inconvenience, some tests may also result in physical complications. Examples include colonic perforation during screening sigmoidoscopy<12> and fetal damage during amniocentesis<13> or chorionic villus sampling to screen for congenital birth defects. Although the risk of such complications is often relatively small, even a small risk per screened person can outweigh potential benefits if the target condition is rare in the screened population.

The results of screening tests can influence clinical decisions to perform interventions that are themselves associated with a certain level of risk. For example, data from routine electronic fetal monitoring suggesting fetal distress may prompt a decision to perform caesarean section, an operation associated with a measurable risk of perioperative morbidity and mortality.<14>

The psychological effects of labelling are another important complication of the results of screening tests. This is the damage done when we tell someone who feels well that they are sick. For instance, persons diagnosed with hypertension are at increased risk of work absenteeism and other behavioural changes.<15,16> Screening for HIV seropositivity may subject a person to discrimination and prejudice.<17> Forty percent of children whose parents believed they had a cardiac abnormality were found to have restricted daily activities, even though 80% had no clinical evidence of heart disease on careful examination.<18>

All of these factors need to be considered in establishing criteria of effectiveness. After establishing an approach to evaluation, the next step involves identifying the pertinent medical literature and reviewing it in accordance with the established criteria.

Review of Evidence

Literature Retrieval Method

The Task Force usually identifies the medical literature with a computerized search using MEDLINE. The keywords used for each topic and the date of the final search are listed under the Evidence subheading in each of the chapters that follow. The reference list is supplemented by citations obtained from experts and from reviews of bibliographic listings and other sources.

In general, animal investigations and studies that include individuals identified as being ill because they had symptoms are excluded. Evidence based on weak study design is excluded where stronger, more compelling scientific evidence is available. Clinical intervention studies are also given greater prominence than more indirect epidemiologic evidence of causal relationships between risk variables and preventable target conditions.

Documentation of the literature retrieval method is provided to make the review process more accessible to others and to ensure that the scope and pertinence of the literature review can be scrutinized.

Evaluation of Evidence

In evaluating the evidence, data from published reports are examined to determine whether a specific maneuver meets the criteria of effectiveness. The hierarchy of evidence places emphasis on study designs that are less vulnerable to bias and errors of inference, such as randomized controlled trials.

The assessment of quality is not concluded by assigning a study to a particular design category. Poorly-designed randomized controlled trials may provide less persuasive evidence than well-designed non-experimental studies. Thus, all studies must undergo critical appraisal for design strengths and flaws. A detailed review of these issues is beyond the scope of this chapter. However, fundamental concerns include: the presence of blinding, treatment of confounders, statistical power and sample size, population characteristics, a priori specification of hypothesis, data analysis methods and sources of bias including the proportions of persons lost to follow-up.

After the strengths and weaknesses of each individual study have been determined, results must be synthesized to form a comprehensive but usable body of evidence. Meta-analysis is still in a developmental stage<19-24> and currently is not used routinely for this function, but it is seen as a powerful tool for selected situations. The synthesis of multiple studies is usually done by reviewers on a less quantitative basis. The key features of the major studies, such as sample size, and the direction, magnitude and significance of effects are normally presented in a tabular or graphic format for easy comparison. Reviewers identify important patterns in results and examine the role of population characteristics and other confounding variables in accounting for differences in results.

These first two steps enable an assessment of the level of certainty that a maneuver is effective. This approach has been transferred to other situations for evaluation of technologies or non-preventive interventions by individuals or by groups. However, the Task Force itself acts as a whole to facilitate review of evidence in accordance with criteria of effectiveness and in a modified consensus development process to develop practice guidelines. The mechanisms developed by the Task Force to do this are described in the next section.

Managing the Committee Analytic Process

The Task Force has a stable panel of members and engages in a continuous process of revising previous recommendations and addressing new topics. Over the sixteen year history of the Task Force, a gradual turnover of members with varied expertise has been ensured. The Task Force has maintained a mix of clinicians and research methodologists. Family practice, pediatrics, geriatrics and several other specialties are represented.

Topics to be reviewed may arise from challenges from the academic community, ambiguity regarding appropriate current practice, conflict between the recommendations of authoritative bodies, or suggestions from individuals, special interest groups or from government. Topic selection also depends upon publication of new research evidence and the personal expertise of the members. Where resources are limited, members assign priorities by ranking the list of possible topics.

Members are assigned specific topics and each has a mandate to submit background papers to the rest of the committee for discussion. Outside consultants are also asked to work with the Task Force on selected topics. Project support staff has been funded since 1988 by a research grant from the National Health Research Development Program. Staff members work under the direction of the chairman and the Task Force members. Health Canada also funds meetings and travel expenses through the Health Services and Promotions Branch. The Task Force meets 2 or 3 times each year for 1-2 days and all background papers are pre-circulated.

The interchange among expert panellists within the conference room permits the airing of important issues, clarification of ambiguous concepts and careful analysis of evidence and recommendations by the group. The advantages of informal discussion by experts and the process of achieving consensus include the opportunity to deal openly with important issues that are not easily quantified or addressed adequately in a more structured analysis (e.g. ethical issues).

At the same time, the personal opinions individuals bring to the process and familiar human characteristics (e.g. forgetfulness, fatigue, interpersonal conflict) can influence the recommendations that are developed. Consensus conferences have methodological limitations<25-31> and are sometimes criticized because only people with similar views are asked to attend. While the approach of Task Force members to the evaluation of the literature is similar (commitment to the evidence-based approach) and they have developed expertise regarding how the methodology works, members of the panel do not always approach the task from the same starting point. Time is taken to reconcile differing points of view. The evidence is presented and deliberated upon until a consensus finally emerges.

It is important to take advantage of the potential strengths of the consensus development process while at the same time adhering to procedural standards and work practices that maintain uniformity and impartiality in the analytic process. These include procedures to achieve adequate documentation, consistency, comprehensiveness, objectivity and adherence to the Task Force methodology.

Developing Clinical Practice Guidelines

The review of evidence of the effectiveness of preventive services serves as the principal basis for clinical practice recommendations. However, the review of the evidence is a conceptually distinct process from the setting of medical policy. Because of the health, economic and social implications of clinical practice guidelines, the scientific evidence must be viewed within the context of the clinical practice and the health care settings to which the recommendations will apply.

As a general rule, the strongest recommendations of the Task Force (A and E Recommendations) are reserved for preventive interventions whose value is supported or negated by high quality evidence (Type I – randomized controlled trials (RCTs)). Type II evidence is of fair quality and generally is associated with B and D Recommendations.

However, other factors come into play as the Task Force considers Canadian practice settings specifically and puts together evidence from various sources. Are the results of studies from the United States, Europe or other developed countries generalizable to Canada? What are the implications in terms of safety, acceptability and cost of clinical procedures to patients and physicians, not only in urban settings but in the variety of practice settings across Canada?

Examples of factors other than evidence that can affect the grade of a recommendation include: limited availability of a particular technology, demonstrated poor average compliance with a procedure and some potential for harm. In such cases the Task Force considers it best to err on the side of caution and not to advocate major changes in accepted practice. On the other hand, in cases where the burden of suffering is overwhelming, the Task Force will tend to be more proactive, since interventions of only minor effectiveness may translate into substantial health benefits for the population as a whole.

The burden of suffering is assessed by considering two factors: first, the impact of the particular condition on the individual, as assessed from the years of life lost, the amount of disability, the pain and discomfort, the cost of treatment and the effect on the individual’s family; and, second, the impact on society as assessed from mortality, morbidity and the cost of treatment. Ambiguity in the evidence regarding morbidity and mortality can also lead to more conservative recommendations (tending towards C Recommendations).

Although interventions are generally not recommended when they are linked to an increase in all-cause mortality or morbidity, the absence of a reduction in overall mortality or morbidity is not always a valid basis for recommending against an intervention. Even if the lack of change in a global outcome measure reflects the exchange of one cause of death, or one form of illness, for another, such an outcome may be desirable for patients whose risk preferences favour such an exchange. For example, the suffering that can precede certain causes of death (e.g. stroke) may make their prevention more desirable for some persons than the prevention of more acute causes of death. Bone fractures may be of greater concern to some women considering estrogen replacement therapy than the risk of endometrial cancer. Non-fatal health outcome measures also tend to be more problematic in that they have less uniform definitions and are less precise in terms of impact on burden of suffering.

The "number needed to treat" is another useful tool.<32> Estimates are made of the number of individuals in the population who would have to receive the intervention per case prevented. Interventions with a large "number needed to treat" may not be in the best interest of the population if treatment is associated with significant costs or harmful effects (iatrogenic side effects, labelling, high cost, etc.).

Given the growing concern about health care costs, the Task Force attempts to furnish information about the cost effectiveness of recommended preventive interventions. Where possible, studies that have examined the costs and effects of an intervention are reviewed. However, information of this type is limited and where it does exist questions often arise concerning the appropriateness of study modelling assumptions including criteria of effectiveness and the equivalence of costs and benefits in the Canadian setting. Thus, the Task Force may only be able to describe a procedure or technology in general terms as costly and as a result tend to de-emphasize it.

The Task Force has increased the efficiency of its operations through close collaboration with the U.S. Preventive Services Task Force. On some issues there has been a division of labour (on a topic by topic basis) between the two groups. Several chapters in this Guide have been adapted from reports of the U.S. Task Force. Similarly, the U.S. Preventive Services Task Force will be adapting Canadian reports for the new edition of its Guide. Preliminary draft reports on all topics are exchanged. This allows committee members to proceed more rapidly to formulation of guidelines after careful consideration of the evidence in the local context. In most instances, the two committees have come to the same conclusions but sometimes there are minor differences of opinion regarding the strength of evidence and/or grading of recommendations.

All four recommendations that are graded positively or negatively (A, B, D and E) reflect a strong conviction that all physicians should adapt their practice to these guidelines. A C grade Recommendation means that there is poor or contradictory evidence regarding the intervention and that decision-making must be guided by factors other than the medical scientific evidence. Such interventions lend themselves particularly well to individual adaptation – considering the physician’s expertise and the patients’ risk profile.

Recognizing the diversity of issues that must be considered in developing sound practice recommendations, the final recommendation is accompanied with a clear and explicit discussion of the underlying rationale. Recommendations and background papers are then distributed to outside experts for peer review and revised appropriately. Detailed Task force technical reports are published in peer-reviewed journals in English (Canadian Medical Association Journal) and in French (L’Union m´ edicale du Canada). A full list of reports published to date is included as Appendix A.

Conclusions

Efforts to enhance scientific standards for medical information synthesis and the assessment of effectiveness are combined with a consensus development mechanism in the Task Force approach. Further refinement can be expected in the future to merge these approaches optimally in order to provide more meaningful recommendations and more rigorous accountability for the methods used to develop those recommendations. These efforts will lead ultimately to a more scientific approach to clinical practice decisions and to more effective and efficient use of health care services in general.

Selected References
 

  1. Woolf SH, Battista RN, Anderson GM, et al: Assessing the clinical effectiveness of preventive maneuvers: Analytic principles and systematic methods in reviewing evidence and developing clinical practice recommendations. J Clin Epidemiol 1990; 43(9): 891-905
  2. Report of a Task Force to the Conference of Deputy Ministers of Health (cat no. H39-3/1980E), Health Services and Promotion Branch, Department of National Health and Welfare, Ottawa, 1980
  3. Canadian Task Force on the Periodic Health Examination: The periodic health examination: 1989 update part 1, Introduction and Part 2, Early Detection of Colorectal Cancer and Problem Drinking. Can Med Assoc J 1989; 141: 209-216
  4. Lawrence RS, Mickalide AD: Preventive services in clinical practice: designing the periodic health examination. JAMA 1987; 257: 2205-2207
  5. S. Lawrence RS, Mickalide AD, Kamerow DB et al: Report of the U.S. Preventive Services Task Force. JAMA 1990; 263: 436-437
  6. U.S. Preventive Services Task Force: Guide to clinical preventive services: An assessment of the effectiveness of 169 interventions. Baltimore, MD: Williams & Wilkins; 1989
  7. Steering Committee of the Physicians’ Health Study Research Group: Final report on the aspirin component of the ongoing Physicians’ Health Study. N Engl J Med 1989; 321: 129-135
  8. Canadian Task Force on the Periodic Health Examination: The periodic health examination. 1986 update. Can Med Assoc J 1986; 134: 721-729
  9. Leske MC: The epidemiology of open-angle glaucoma: a review. Am J Epidemiol 1983; 118: 166-191
  10. Battista RN, Fletcher SW: Making recommendations on preventive practices: methodological issues. In: Battista RN, Lawrence RS (eds.): Implementing Preventive Services. New York, Oxford University Press, 1988; 53-67
  11. Sorenson JR, Levy HL, Mangione TW, et al: Parental response to repeat testing of infants with "false positive" results in a newborn screening program. Pediatr 1984; 73: 183-187Nelson RL, Abcarian H, Prasad ML: Iatrogenic perforation of the colon and rectum. Dis Colon Rectum 1982; 25: 305-308
  12. Campbell TL: Maternal serum alpha-fetoprotein screening: benefits, risks and costs. J Fam Pract 1987; 25: 461-467
  13. Shy KK, LoGerfo JP, Karp LE: Evaluation of elective repeat cesarean section as a standard of care: an application of decision analysis. Am J Obstet Gynecol 1981; 139: 123-129
  14. Lefebvre RC, Hursey KG, Carleton RA: Labelling of participants in high blood pressure screening programs. Implications for blood cholesterol screenings. Arch Intern Med 1988; 148: 1993-1997
  15. MacDonald LA, Sackett DL, Haynes RB et al: Labelling in hypertension: a review of the behavioural and psychological consequences. J Chronic Dis 1984; 37: 933-942
  16. Hermann DHJ: Torts: Private Lawsuits about AIDS, Yale U Pr, New Haven, Conn, 1987: 153-172
  17. Bergman AB, Stamm SJ: The morbidity of cardiac nondisease in schoolchildren. N Engl J Med 1967; 276: 1008-1013
  18. Thacker SB: Meta-analysis: A quantitative approach to research integration. JAMA 1988; 259: 1685-1689
  19. Jenicek M: Meta-analysis in medicine: Where we are and wherewe want to go. J Clin Epidemiol 1989; 42: 35-44
  20. L’Abbe KA, Detsky AS, O’Rourke K: Meta-analysis in clinicalresearch. Ann Intern Med 1987; 107: 224-233
  21. Wittes RE: Problems in the medical interpretation of overviews.Stat Med 1987; 6: 269-280
  22. Sacks HS, Berrier J, Reitman D et al: Meta-analyses ofrandomized controlled trials. N Engl J Med 1987; 316: 450-455
  23. Yusuf S: Obtaining medically meaningful answers from anoverview of randomized clinical trials. Stat Med 1987;6: 281-294
  24. Oliver MF: Consensus or nonsensus conferences on coronaryheart disease. Lancet 1985; 1(8437): 1087-1089
  25. Jacoby I: Evidence and consensus. JAMA 1988; 259: 3039
  26. Greer AL: The two cultures of biomedicine: Can there beconsensus? JAMA 1987; 258: 2739-2740
  27. Wortman PM, Vinokur A, Sechrest L: Do consensus conferences work? A process evaluation of the NIH Consensus Development Program. J Health Polit Policy Law 1988; 13: 469-498
  28. Vinokur A, Burnstein E, Sechrest L et al: Group decision making by experts: field study of panels evaluating medical technologies. J Pers Soc Psychol 1985; 49: 70-84
  29. Perry S: The NIH consensus development program: A decadelater. N Engl J Med 1987; 317: 485-488
  30. Institute of Medicine, Council on Health Care Technology: Consensus Development at the NIH: Improving the Program. Report of a Study by the Committee to Improve the National Institutes of Health Consensus Development Program. Washington, D.C., National Academy Press, 1990
  31. Laupacis A, Sackett DL, Roberts RS: An assessment of clinically useful measures of the consequences of treatment. New Engl J Med 1988; 318: 1728-1733


hard rule

Top of Page

hard rule
 

Home PageCTFPHC Home Page

Copyright © 1997 Canadian Task Force on Preventive Health Care
For any technical issues please contact: webmaster@ctfphc.org
Last modified: June 10, 1998.