Controversial choice of a control intervention in a trial of ventilator therapy in ARDS: standard of care arguments in a randomised controlled trial
- Correspondence to: H Mann Department of Radiology, Program Associate, Division of Medical Ethics, 1A71 University Hospital, 50 North Medical Drive, Salt Lake City, UT 84132, USA;
- Received 26 September 2004
- Accepted 17 November 2004
- Revised 14 November 2004
When evaluating an innovative intervention in a randomised controlled trial (RCT), choosing an appropriate control intervention is necessary for a clinically meaningful result. An RCT reported in 2000 addressed the relative merits of two tidal volume ventilatory strategies, 6 ml/kg (innovative) and 12 ml/kg (control), in patients with acute respiratory distress syndrome. Critics claim that the 12 ml/kg volume did not represent the clinical practice standard at that time, and that lower tidal volumes had been used in some patients prior to randomisation. The trialists responded that current practice involved the use of a broad range of tidal volumes, including 12 ml/kg. Appropriate control interventions for RCTs can be ensured by: a systematic review of the relevant literature; a formal survey of expert clinicians; and publication of the proposed research protocol to solicit critical appraisal. A global survey of experts during the RCT’s design stage would have been of probative value in determining the appropriate control tidal volume. Hypothetical, but plausible, results of such a survey are presented and examined to demonstrate the value of this method.
The benefits, risks, burdens and effectiveness of a new method should be tested against those of the best current prophylactic,
diagnostic, and therapeutic methods. (World Medical Association, Declaration of Helsinki1).
A randomised controlled trial (RCT) is performed to resolve uncertainty about the relative merits of medical interventions. When an innovative intervention is studied, an appropriate choice of a control intervention is necessary to ensure a clinically meaningful result. An RCT to evaluate the relative merits of different tidal volumes used to ventilate patients with acute respiratory distress syndrome (ARDS) was conducted by the ARDS Network (ARDSNet) and published in 2000.2 The Network was established in 1994 as a Contract Program associated with the National Heart, Lung, and Blood Institute of the National Institutes of Health, and is a consortium of 19 clinical centres. A steering committee comprised of a principal investigator from each centre reviews and approves proposed trials in patients with ARDS. With respect to the ventilator trial, the investigators proposed, on the basis of data from physiological and animal experimentation, and uncontrolled observations in clinical care, that a low volume strategy may be superior to higher volumes often used in clinical practice. An RCT was proposed to resolve this uncertainty by randomising participants between two tidal volume strategies: 6 ml/kg and 12 ml/kg. The trial was conducted at 10 university centres after each centre’s institutional review board approved the trial.
Intense controversy about the scientific and ethical validity of the trial followed the publication of an analysis that questioned the choice of the 12 ml/kg control value. The publication was a meta-analysis of trials testing low tidal volumes in patients with ARDS, one of which was the ARDSNet trial, and its primary contention was that the trials demonstrating a beneficial effect did so because the control tidal volumes were too high, and did not reflect “current best practice standards at the time”.3 As a result of a communication of the article’s authors’ concerns to the US Office for Human Research Protections (OHRP), the latter office was obliged to investigate the conduct of the trial pursuant to its compliance monitoring obligations.4 Because the complainants raised similar concerns about the research design of an ongoing ARDSNet RCT concerning the administration and monitoring of intravenous fluids in patients with ARDS, this trial was suspended at the onset of the investigation. Details concerning the complaint and subsequent deliberations and actions involving the OHRP have been the subject of a published report.5
Publication of the critical meta-analysis was followed by intense criticism by a research advocacy group,6 substantial published correspondence,7,8,9,10 commentary and editorials in journals,11,12 statements by representatives of professional societies,13 and a request by the OHRP for “further discussion within the scientific and bioethics communities about issues regarding appropriate research design in the absence of a standard of care”.14 A chronology of the salient events in this controversy is provided in table 1.
In this essay I will evaluate the contending standard of care arguments in the ARDSNet trial and propose that they reflect substantially disparate interpretations of information available at the time the trial was designed. I will review specific conceptualisations of standard of care in medical practice and how these relate to the choice of a control intervention for an RCT in general, and for the ARDSNet trial in particular. I will clarify specific methods whereby an appropriate choice of control intervention for such a trial may be pursued: a systematic review of the relevant literature and surveying expert clinicians in the field of interest.
UNCERTAINTY AND TRIAL DESIGN IN THE ARDSNET TRIAL
Critically ill patients with ARDS require the application of multiple medical interventions. ARDS experts believed that one component of assisted ventilation—the degree of alveolar distension—may significantly affect clinical outcome. The ARDSNet trialists proposed to resolve this uncertainty by isolating the effects of tidal volume in an internally valid efficacy RCT, the results of which would be convincing to the medical community. Sackett15 has explained how the confidence in the conclusion of an RCT is the ratio of the magnitude of the signal to the magnitude of the noise times the square root of the sample size. Because signal derives from differences between the effects of the interventions being compared (control event rates), the trialists chose 12 ml/kg as the control volume to ensure a “strong” signal, after considering the range of tidal volumes used in clinical practice. In general, the greater the anticipated differences between the specified outcome measures, the smaller the required sample size, all else equal. To minimise noise, non-uniform application of non-experimental cointerventions (typically multiple in critical care trials) was controlled by the use of a predefined computerised management protocol, the application of which had been previously described.16–18 The trial’s results demonstrated the relative superiority of the low volume strategy: the primary outcome of mortality in this group was 31% compared with 39.8% in the control arm.
CRITICISMS OF THE TRIAL
The critics contended that the trial results demonstrated the superiority of 6 ml/kg over 12 ml/kg, but not the relative merits of the lower tidal volume compared with “routine” care, which allegedly involved the use of lower tidal volumes.3 This reference to routine care followed a publication in 2000 which revealed that, prior to randomisation, patients in the participating centres were ventilated with a mean tidal volume of 10.3 (SD 2) ml/kg, and that the mortality in a group of eligible non-participating patients was lower than in those randomised to the 12 ml/kg arm—31% versus 39.8%.19 The critics also claimed that the mortality in this non-participating group was equivalent to the mortality reported in ARDS patients in 1996,20 the year the ARDSNet trial began. Finally, it was asserted that “Overall, this study design may have resulted in substantial numbers of control patients receiving inferior treatment in the ARDSNet trial”.3 In published correspondence,7,8,9,10 the ARDSNet trialists contended that each one of these criticisms was invalid, and I discuss their responses in the following sections.
CONTENTIONS ABOUT STANDARD OF CARE IN MEDICAL PRACTICE
The ARDSNet trialists contended that the trial was not designed to compare the outcomes of patients who received lower tidal volumes with those who received routine care, because routine care encompassed a broad range of tidal volumes and inspiratory pressures. They contended that, “at the time the trial was designed, nobody knew if the traditional or lower tidal volume approach was superior”.8 The unimodal distribution of tidal volumes used by clinicians at each of the participating centres9 (fig 1), with a peak centred at 10.5 ml/kg, was cited by the critics as evidence of a practice standard. The trialists responded that the distribution may be interpreted as an expression of uncertainty on the part of a sufficient number of clinicians concerning the relative merits of lower and higher volume strategies: in anticipation of the results of the ARDSNet trial, many clinicians at the participating centres might have chosen to use a tidal volume intermediate between those employed in the trial.
In published correspondence following the publication of the critics’ article, some expert clinicians defended the choice of the 12 ml/kg tidal volume: Carmichael, coauthor of a report of a 1992 survey of critical care specialists, disputed the existence of a “best” or “current” practice standard by referring to a “wide disparity in the selection of tidal volumes …” and the fact that “more than half of respondents reported using a tidal volume larger than 10 ml/kg”.21 Petty, an acknowledged ARDS expert and author of a state of the art review of ARDS in 1990, also disputed the contention that intermediate tidal volumes represented a standard of care by stating “I don’t believe this is true, based on my own observations while visiting many intensive care units around the country and in Europe over many years, including the decade of the nineties when the ARDS trial was done. Most units employed high tidal volumes according to our original recommendations”.22 It is notable that these defenders’ perceptions of standard of care relate to medical practice beyond that in the participating ARDSNet centres. The critics countered that these opinions did not accurately “reflect the tidal volumes most commonly used by clinicians for managing patients with ARDS in 1996, the time that the ARDSNet trial of low tidal volume began”.10
It is the case that a number of commentators have documented a historical trend towards the use of lower tidal volumes in the 1990s, but these observations were described in a number of articles published between 1999 and 2003.23–25 Acknowledgment of this trend does not permit the identification of a point in time at which a lower volume, or a certain specified range of volumes, would be considered the extant standard of care. Recognising the disparate beliefs of the critics and respondents quoted above, how may the design of this kind of RCT be informed to permit the acquisition of important knowledge to assure the interests of patient-participants are protected?
ACKNOWLEDGMENT OF UNCERTAINTY AND SECURING PARTICIPANTS’ MEDICAL INTERESTS IN RCTS
The impetus for an RCT originates in a state of epistemic uncertainty in the community of expert physicians concerning the relative merits (benefits and harms) of comparator interventions. This state has been described as one of clinical equipoise.26 The RCT is a formal means of resolving this uncertainty and should explicitly be designed to disturb equipoise. This state of uncertainty is also of interest to research ethics committees, which must approve a proposed trial before it may be offered to physicians and patients. Once it is available, participating physicians have to decide whether to offer it to particular patients. Finally, each patient (or a proxy) has to decide whether to accept an offer of enrolment. When a patient, informed about what is known about the benefits and harms of the comparator interventions, is also “maximally uncertain” about their relative merits—when the patient is in a state of equipoise—randomisation represents a reasonable treatment choice under conditions of uncertainty.27–29 Thus, the goals of the trialist (the acquisition of scientific knowledge) and the patient-participant (the best treatment) may concurrently be accomplished.30
RCTs may involve investigational or commonly used interventions. When an innovative intervention is being evaluated, the appropriate choice of a control intervention is critical. An inappropriate choice may nullify the clinical value of a trial that otherwise meets the requirement of internal validity. For example, in an RCT in which the control intervention is a drug, an inappropriately low dose or method of administration of the control agent will violate the equipoise principle.31Comparator bias describes a situation in which a new treatment is compared with an existing treatment that is known to be inferior or when the treatment is compared against nothing when effective treatments already exist.32 What methods should be used to avoid comparator bias? In a previous communication,33 Djulbegovic and I elaborated on this issue, and the methods that may be used include:
systematic review of the relevant literature
formal survey of relevant clinical experts
publication of the trial’s protocol to solicit critical appraisal of its rationale and design.
A SYSTEMATIC REVIEW OF THE RELEVANT LITERATURE
The choice of a control intervention should be justified by such a review. In drug RCTs the proper dose and method of administration of the control intervention are relevant, and these may be determined from a systematic literature review. A review is also pertinent when a placebo-only control arm is proposed: research synthesis should establish that there is no known effective therapy for the medical condition involved. The ARDSNet trial differed in an important respect: the intervention being evaluated was a continuous variable—tidal volume. How would such a review have informed the design of the ARDSNet trial? A systematic review of the ARDS related literature would have yielded the aforementioned survey of expert clinicians, conducted in 1992, concerning the management of patients with ARDS.34 The survey questionnaire was sent to 3264 members of the American Thoracic Society Assembly of Critical Care Medicine. The majority (65%) of the 1023 respondents were board certified internists and certified in pulmonary and/or critical care medicine, practising in the USA and 24 countries abroad. Equal numbers practised in university teaching (43%) and community hospitals (42%). The survey revealed a division of opinion in the expert clinical community concerning the choice of tidal volumes: 45% and 48% of respondents thought tidal volumes of 5–9 ml/kg and 10–13 ml/kg, respectively, most appropriate in patients with ARDS. Although not cited in the trial’s protocol,35 it was apparently used by the ARDSNet trialists when determining a control tidal volume.
THE PROBATIVE VALUE OF A FORMAL SURVEY OF EXPERT CLINICIANS
Formal surveys of expert clinicians have been used to confirm the existence of uncertainty necessary to propose an RCT,36 and to document physicians’ prior beliefs about an intervention’s relative efficacy when bayesian methods were used to conduct interim analysis in an ongoing RCT.37 In a cardiology RCT in which the efficacy of a platelet inhibitor (eptifibatide) was compared with a placebo, a survey of a group of interventional cardiologists to evaluate the existence of a putative standard of care was apparently determinative in persuading the Food and Drug Administration to lift the “clinical hold” the agency initially put on the proposed trial, because of its concern about the appropriateness of the placebo controlled design.38
Allegations about breaches of the standard of care are a central feature of medical malpractice actions. In the courtroom, attorneys elicit testimony from expert witnesses for the plaintiffs and defendants according to formal rules of evidence intended to establish facts salient to the action. Evidence is considered “probative”, and hence admissible, if it logically tends to prove the proposition for which it offered. A survey of relevant clinicians has been proposed as a method of securing evidence for or against a putative standard of care in such situations.39,40
Analogously, ascertaining the informed beliefs and practices of expert clinicians may enable an appropriate choice of control intervention for an RCT, particularly when, as in the ARDSNet trial, the notion of a “standard of care” may arise. Indeed, a requirement for an antecedent survey of the current practice of clinicians at participating centres was suggested by the trial’s critics,10 and the OHRP.4 This would certainly help to discern the current practice patterns of physicians at participating centres, but it is not clear that this is necessarily an appropriate strategy for determining a putative standard of care against which to compare the low volume strategy. As previously described, the salience of a survey limited to practices in the participating centres only, is disputable. Is there another way to conceptualise standard of care arguments that may be pertinent to the ARDSNet trial?
The notion of a standard of care in clinical research has been particularly controversial in the design of placebo controlled trials conducted in developing countries by investigators and sponsors from developed nations. London has explored the controversy by distinguishing between two reference points—local versus global; and two conceptual interpretations of standard of care arguments—de facto and de jure.41 In the de facto interpretation, the standard of practice is set by the actual medical practices of the relevant reference community. In the de jure interpretation, the standard is determined by the judgement of medical experts in the reference community as to which interventions have been proved effective. The notion of proof relates to a body of scientific evidence, particularly from controlled clinical trials, supporting an intervention’s efficacy in a population of interest. The epistemic character of the de jure standard is particularly relevant to the notions of uncertainty and clinical equipoise, and it is also consistent with the greater evidential weight generally accorded to the results of controlled trials compared with uncontrolled observational data.42
Prior to the conduct of the ARDSNet trial, a body of evidence supportive of a certain tidal volume, or a certain narrow range of tidal volumes, as a de jure standard of care did not exist. Because the tidal volumes employed by clinicians were known to be diminishing over time, and the last survey had been conducted about four years prior to the initiation of the ARDS trial, the trialists may have repeated the survey of a representative number of clinicians in the global community of ARDS experts. The results may have been of substantial probative value from the de jure perspective: in addition to enumerating the tidal volumes they actually employed in practice, the respondents should have been asked to clarify the reasons for their choices. A survey should attempt to discern whether the choice potentially represents preference neutrality or an explicit choice grounded in an assessment of published evidence pertaining to the clinical issue at hand. This distinction has been made by Ashcroft,43 as follows: “Equipoise is not simply preference neutrality. The physician may not prefer treatment A to treatment B, or vice versa. But what is important is not whether this is his or her treatment preference, but whether he or she has reason to prefer one treatment to the other. Preferences for treatment should be rationally corrigible.”
Hypothetical results of a survey of applied tidal volumes are presented in fig 2. I will now present reasonable interpretations of these results. The results of survey A do not reveal any tendency to a favoured tidal volume in the range evaluated. The proposed choice of 12 ml/kg may be affirmed as acceptable. Survey B reveals a notable and tight concentration of applied tidal volumes around 10 ml/kg: 84% of respondents used tidal volumes between 9 ml/kg and 11 ml/kg. This result would compel the trialists to reconsider the choice of control volume. Use of 10 ml/kg as a control volume would still permit the conduct of a trial with sufficient signal magnitude to address the uncertainty of interest and, if the results convincingly demonstrate the superiority of the lower tidal volume, an evidential claim for the latter as a de jure standard of care is established. The community of experts would assess the results in the light of other evidence—including other controlled trials—and such research synthesis and accompanying debate would properly address the merits of the claim. Survey C closely mirrors the range of tidal volumes actually used in the ARDSNet trial participants before randomisation and does not affirm 10 ml/kg as a standard of care against which the 6 ml/kg strategy should be compared: 54% of respondents used tidal volumes between 9 ml/kg and 11 ml/kg, and higher volumes were used by 23% of respondents.
SHOULD A THIRD ARM HAVE BEEN USED IN THE ARDSNET TRIAL?
Whether a third arm using 10 ml/kg, reflective of “routine care”, should have been included in the trial is another issue that has been raised.44 This proposal should seriously have been considered if, based on other information, it was plausible that an intermediate tidal volume might have been superior to both volumes being compared. Miller and Silverman have discussed this possibility, and the pros and cons of a third arm, in the ARDSNet trial.45 Inclusion of a “routine care” control group in the trial would have necessitated enrolling additional participants, increasing the cost and complexity of the trial, and complicating its analysis. Because the likelihood that a “routine care” group would fare better than either the 6 ml/kg or the 12 ml/kg group could not have been predicted with any certainty antecedent to the initiation of the trial, assertions about this are necessarily influenced by hindsight bias.
However, in an additional informative communication concerning control group selection in critical care RCTs in general,46 Silverman and Miller consider further the merits of RCTs that, like the ARDSNet trial, compare two contrasting strategies. They argue that “as the contrasting strategies become less representative of the broad range of standard care practices, the less assurance is provided that both study groups will have mortality rates that do not exceed standard practices. Accordingly, the greater becomes the need to include a representative standard of care control group to permit a trial to detect whether research subjects are being exposed to excessive harms”. In judging the extent to which the contrasting strategies are representative of standard practices, they state that “specification of any numerical percentage would be arbitrary”. The results of a survey, as mentioned above, would substantially inform a judgement about the need for inclusion of a “representative standard of care control group”.
Determining the implications of a survey’s results will always entail judgement, which may engender disagreement. Nevertheless, I believe its value outweighs such concerns. The results of the survey should be incorporated in the research protocol submitted for review by research ethics committees, whose actions in this matter have been criticised.14,47 A relevant summary of de facto practices may also be presented to potential trial participants in the information sheet for the trial.
OBJECTIONS TO THE CONDUCT OF SURVEYS
I anticipate that concerns about the expense and practicability of conducting such surveys will arise. However, unlike the requirement for a systematic literature review, they will not be necessary for most RCTs. In others, a survey limited to a smaller number of experts may suffice. In any event, the adverse consequences of an inappropriate choice of control intervention justify this method.
Conducting informative controlled trials in a critical care environment is complex and challenging. Because mortality is relatively high and potential participants commonly have impaired decisional capacity, appropriate acknowledgment of uncertainty is critical prior to randomising patients among interventions in an RCT. Guidance for conducting trials is available,48 but this does not typically address uncertainty and standard of care issues in sufficient detail. Systematic methods are available that may be used to inform the choice of a control intervention, and the rationale for the choice should be explicated in the published report of the trial. Miller and Rosenstein49 have addressed the issue of explicitly reporting pertinent ethical issues to promote public accountability for clinical research. Editors and reviewers should require this of author-investigators. If a survey is used to inform the design of the trial and the choice of interventions, it should also be published.
J Mann and A London provided helpful critiques of draft versions of this essay.
AUTHOR’S STATEMENT AND CONTRIBUTIONS HM conceived this essay, made necessary revisions, and approved the final version submitted.