Original article
Are randomized clinical trials good for us (in the short term)? Evidence for a “trial effect”

https://doi.org/10.1016/S0895-4356(00)00305-XGet rights and content

Abstract

Objective: To assess whether there is evidence that randomized controlled trials are systematically beneficial, or harmful, for patients. In other words, is there a “trial effect”? If so, to examine whether the evidence sheds light on the likely sources of the difference in outcomes. Methods: Systematic review of the literature. Results: We set out in some detail potential sources of a “trial effect” and potential biases. We found only 14 research articles (covering more than 21 trials) with relevant primary data. We extracted, with difficulty, quantitative data-sets from the articles, and classified these according to likely source of any apparent trial effect. The categories used were: differences in prognosis; superior treatment in the trial; and “protocol/Hawthorne effect” (benefit from improved routine care within a trial). Analysis: The evidence available is limited in breadth (coming largely from cancer trials) and quality, as well as quantity. There is weak evidence to suggest that clinical trials have a positive effect on the outcome of participants. This does not appear to depend strongly on the trial demonstrating that an experimental treatment is superior. However, benefit to participants is less evident where scope for a “protocol/Hawthorne effect” was apparently limited (because there was no effective routine treatment or because the comparison group also received protocol care). A form of bias, arising if clinicians who tend to recruit to trials also tend to be better clinicians, could also explain these results. Conclusion: While the evidence is not conclusive, it is more likely that clinical trials have a positive rather than a negative effect on the outcome of patients. In the limited data available, the effect seems to be larger in trials where an effective treatment already exists and is included in the trial protocol. Recommendation: That carefully researched treatment protocols, and monitoring of outcomes, be used for all patients, not just those in trials.

Introduction

It is widely recognized that using well-conducted randomized clinical trials (RCTs) to evaluate the efficacy of treatments is beneficial to society in the long run. Harmful or ineffective treatments can be dropped and replaced by effective ones. However, in this article we are concerned with the short-term effects of RCTs—the effects before the results are known, essentially “side effects” of RCTs. We describe the results of a systematic review of direct empirical evidence relating to outcomes in patients likely to be affected by an RCT. The aim is to summarize and analyze this data to examine whether the existence of an RCT is, on average, beneficial (or harmful) to patients affected by it (i.e., to see whether there is a “trial effect”).

Our experience is that researchers often assume the existence of strong empirical evidence that trial participation is beneficial. However, our systematic literature search, completed in August 1996, located only one review [1], which provided some weak evidence for the existence of a beneficial trial effect in participants. This article updates that review and goes further by examining possible causes. In an attempt to make the article non-repetitive, clear and easy to follow, we have adopted a more narrative structure. First we set out in some detail what might be meant by a “trial effect” and how it might be estimated. We then briefly describe the articles we found, and summarize the evidence for trial effects, as reported by the authors. We then analyze data from the articles in an attempt to shed light on whether a “protocol” effect could be important in practice. Finally we reflect on the implications. The full systematic review and analysis can be found elsewhere [2].

In fact there might be a variety of different trial effects—for participants, for refusers, for those not offered entry (but receiving treatment from trial clinicians)—and the effect could be beneficial for one group and simultaneously harmful for another. It is not difficult to think of reasons why the existence of an RCT might be beneficial or harmful. For instance, if trial participation gives an improved chance of receiving a new more effective treatment, or if trial clinicians become better informed, or more careful (because they feel under observation), or are required to follow a carefully researched protocol for those in the trial, or if trial participation simply makes patients feel more useful—then outcomes might be improved. Conversely, if patients find the consent process traumatic, or if it results in loss of faith in clinicians or treatments [3], or if trial participation results in reduced access to better treatments (clinicians recruiting to a trial do not always tell patients when they think one treatment is better) [2], then outcomes could be worsened. The questions thus require an empirical answer. Further, it may be possible to establish the size of effects resulting from different “components” of RCTs—from differences in treatment efficacy, from other differences in care, from psychologically mediated differences. The existence, direction, and source of trial effects are of course highly relevant to debate in medical ethics, and could impact on normal medical practice as well as research practice. As many patients are potentially affected by RCTs each year, there is also immense public interest.

Obviously, to make a judgment about whether a trial has affected outcome requires comparison of data from a “trial group” of patients and from some “non-trial group.” The ideal way to estimate overall trial effect would be to randomize clinicians (without telling them) to either recruit patients to a clinical trial or not, and then compare outcomes for all their patients. Unassailable evidence from prospective studies that trials are beneficial or harmful is thus unlikely, and we are inevitably going to have to rely on observational studies. There are well-known advantages to having a carefully chosen concurrent control group, rather than simply using historic data. In the latter case, any apparent trial effect could be confounded by the natural course of the disease [4], by seasonal change, or by regression to the mean, for example. There are a variety of different, concurrent, “trial” and “non-trial” groups that could be compared in observational studies (see Table 1).

We want to estimate the true differences in outcome that result from an RCT (or perhaps from one of the components of an RCT). Table 2 lists possible components contributing to a trial effect.

A “treatment effect” would benefit participants if newer, more experimental treatments tested in trials tended to be better than more widely used alternatives. For example, one study [5] (and there are not many [6]) looked at published RCTs comparing innovative and standard surgical procedures. They found innovations were apparently better than standard about one-third of the time, although this is likely to vary by clinical area and over time.

The “protocol effect” is so-called as main treatment regimens in a trial would usually be carefully described in a trial protocol. Grimshaw and Russell [7] found evidence that use of explicit guidelines (outside trials) does improve both process and outcome.

Any “care effect” would be difficult to estimate separately from the protocol effect, but would derive from such things as extra follow-up, or extra nursing cover, for data gathering.

“Hawthorne effects” are due to changes in patient or clinician behavior, on being involved in a trial, due to increased knowledge or interest or due to feeling “observed.” Such changes in behavior might also differ in extent between participant and non-participant patients of recruiting clinicians.

“Placebo effects,” which arise from the impact of the consent process on patients, might be different for the sorts of patients who tend to agree to participate than for those who tend to refuse.

Scope for a protocol, care, or Hawthorne effect is not universal—it would seem to logically require the existence of an effective routine treatment, which can be given more, or less, effectively.

If there is no known effective treatment routinely available, then, on the face of it, there is no mechanism by which a protocol, care, or Hawthorne effect could operate, at least in terms of clinical outcome (rather than quality of life). This could, however, overlook certain non-specific aspects of care, such as quality of nursing, which may in fact affect outcome, but which have not been evaluated and so are not known to be effective treatment.

If the trial is of a treatment subsequent or additional to routine treatment, and routine treatment occurs outside the trial, then there may be no scope within the trial for “improved routine care” unless aspects of the trial protocol also contribute to and improve on “routine care.”

In comprehensive cohort studies [8], the trial protocol is also applied (except insofar as randomization) to the non-trial comparison group. Obviously, we would not then expect to observe protocol, care, or Hawthorne effects when comparing the two groups.

There are two different questions which knowledge of the various component effects would help answer.

  • 1.

    The narrower question is “would a particular patient, who has just been offered entry into a trial, be wise to accept the offer?” Such a patient's clinician is clearly already recruiting for a trial, and the patient has already been exposed to the informed consent process. Hawthorne and placebo effects are thus respectively of lesser, and no relevance to the decision.

  • 2.

    The broader question is “what is the effect of the existence of the RCT on current patients?” In this case all the trial effect components in Table 2 are clearly relevant.

Anything which systematically differs between the trial group and the comparison non-trial group, and which is not intrinsically caused by the trial, may cause a bias (see Table 3). Sources of bias of concern to us are thus conceptually similar [9], but different in their particulars, to sources of bias within RCTs. For instance, if initial prognosis were better in the trial group than in the comparison group, then any simple comparison would be biased (in favor of a positive trial effect)—a patient selection bias. Similarly, if patient deaths (say) were recorded more reliably (i.e., not missed so often) in the trial group, then the comparison would be biased (this time in favor of a negative trial effect)—a detection bias. However, if asking patients for informed consent were to prove very demoralizing and produce worse outcomes in the trial group, this would not be a source of bias for us, since it would be an integral part of the trial effect we are looking for. Some errors of execution in a trial might cause bias within the trial, but not be of great concern to us. For instance if an incidental co-intervention (like nursing care) were better in one arm of the trial than in the other (and better than outside the trial), this might result in a biased trial, but any improvement in outcomes would properly contribute to the trial effect we are interested in.

In the absence of an ideal study, there will be serious concerns about systematic differences between clinicians that recruit for trials and those that don't. If clinicians that recruit for trials are, on average, better clinicians, then patients in trials may appear to do better simply because they are being treated by better clinicians. A “within-clinician” comparison (e.g., between participating and non-participating patients of the same clinicians) could help to reduce clinician selection bias. However, this would mean the effects of trials on clinician performance could not be assessed. Another problem with within-clinician comparisons is likely to be patient selection bias, since participating patients are self- (and clinician-) selected. Use of admission criteria (e.g., only compare those non-participants who were eligible), and allowing for any known prognostic factors in analysis could help reduce the likely size of such biases. A comparison of those actually enrolled with those who refused would ensure all are eligible, and adjustment for some prognostic differences may be possible. However, two such self-selected groups are likely to differ, at least psychologically, in ways of unknown prognostic importance. Also note that a participant/refuser comparison cannot assess the effects of asking for informed consent.

Thus, broadly, comparisons between groups of patients of the same clinicians will not include any Hawthorne effect, may not include any placebo effect, and will be less exposed to clinician selection and study-induced biases. Comparisons between patients from different groups of clinicians (i.e., recruiting and non-recruiting) will potentially include all components of a trial effect, but will be potentially exposed to all biases, although perhaps less to patient selection bias depending on circumstances. Hawthorne and placebo effects are thus confounded with clinician selection and study-induced biases.

In short, there are may different comparisons one might make, to investigate different components of a trial effect, each with different advantages and disadvantages. However, in a review, one can only use what is already available in the literature.

Section snippets

Literature search

We undertook a systematic search of the literature for relevant data as part of a broader systematic review of ethical issues in designing and conducting RCTs. Details of our search can be found elsewhere, where we also describe the studies found and their results [2]. It is true to say that finding relevant papers was not straightforward, as there appears to be no set of search terms that is particularly sensitive and specific. Many of the papers found were located via reference lists in

Is there a trial effect?

We report the results as given by the investigators themselves (i.e., comparisons at article level). In eight of the 14 articles, patients' clinical outcome was reported as being significantly better, statistically speaking, among trial participants than among non-trial controls 14, 15, 16, 17, 18, 19, 22, 24. In six articles the results failed to reach statistical significance. Three of these, however, reported a favorable overall trend with trial entry 8, 12, 20. A fourth [23] claimed to show

What do the data tell us about the origin of trial effects?

In many of the articles, the overall comparison between trial and non-trial results was reported simply as a P-value, which gives little or no information on the size or precision of that difference. In addition, many articles grouped trials together for analysis, preventing examination of differences between these trials. We have therefore gone beyond the description of results as reported by the authors, and sought quantitative estimates (with precision), at as dis-aggregated a level as

Discussion and conclusions

Our first conclusion is that there is little good quality evidence available. The evidence there is comes mainly from cancer trials and conclusions should perhaps be restricted to such trials, or at least used with extreme caution in other areas. The effect of trials could be very different in other disease areas—especially where the disease processes, treatments, and outcomes are very different—in psychiatry for instance. Our analysis, albeit somewhat subjective and made more difficult by

Acknowledgments

This study was supportedby the Methodology section of the U.K. National Health Technology Assessment programme. We thank Dr. Jenny Hewison, Dr. Chris Hyde, Ms. Jennifer Jackson and Dr. C.A. Stiller for their helpful comments.

References (26)

  • S.J.L. Edwards et al.

    Ethical issues in the design and conduct of randomised controlled trials

    Health Tech Assess

    (1998)
  • J.P. Gilbert et al.

    Progress in surgery and anesthesiabenefits and risks of innovative surgery

  • I. Chalmers

    What is the prior probability of a proposed new treatment being superior to established treatments?

    Br Med J

    (1997)
  • Cited by (446)

    View all citing articles on Scopus
    View full text