Article Text

Assessing risk/benefit for trials using preclinical evidence: a proposal
  1. Jonathan Kimmelman,
  2. Valerie Henderson
  1. Biomedical Ethics Unit, McGill University, Montreal, Quebec, Canada
  1. Correspondence to Dr Jonathan Kimmelman, STREAM (Studies of Translation, Ethics and Medicine), Biomedical Ethics Unit, McGill University, Montreal, Quebec, Canada H3A 1X1; jonathan.kimmelman{at}


Moral evaluation of risk/benefit in early phase studies requires assessing the clinical promise of a candidate intervention using preclinical evidence. Yet, there is little to guide ethics committees, investigators, sponsors or other stakeholders morally charged with making these assessments (‘evaluators’). In what follows, we draw on published guidelines for preclinical study design to develop a structured process for assessing the clinical promise of new interventions. In the first step, evaluators gather all relevant preclinical studies, assess the magnitude of treatment effects and determine clinical promise in light of various threats to valid clinical inference. In the second step, evaluators adjust the assessments of clinical promise from preclinical studies by examining how other agents in the same reference class—and supported by similar evidence—have fared in clinical development. Assessments of clinical promise can then be fed into the moral evaluation of risk and benefit in early phase trials. Though our approach has limitations, it offers a systematic and transparent method for assessing risk/benefit in early phase trials of novel interventions.

  • Animal Experimentation
  • Clinical trials
  • Research Ethics
  • Scientific Research
  • Stem Cell Research

Statistics from


All policies in trial ethics require a favourable balance of risk and benefit. In many research arenas, judgements about risk/benefit are supported by prior trials and clinical experience. In novel arenas, however, investigators, ethics review committees, sponsors and data-monitoring committees (‘evaluators’) must draw on preclinical evidence. Nevertheless, prescription on how evaluators should apply preclinical evidence towards risk/benefit assessment is limited.

What follows centres on how evaluators should use evidence of clinical promise gathered in animals (‘efficacy studies’). We describe when efficacy studies are crucial for ethical evaluation of trials, and argue that evaluators should directly assess efficacy evidence when contemplating early phase trials. We then propose a structured process for assessing the clinical promise using efficacy studies. We close by describing the limitations of our approach.

Efficacy studies and risk/benefit

Two classes of preclinical evidence generally support clinical development. The first includes toxicology and pharmacokinetics. These aim at anticipating safety issues and tolerable doses for trials. The second category is ‘efficacy studies’. These are designed to demonstrate ‘clinical promise’ in a proxy disease setting (we use ‘clinical promise’ in a narrow sense: the probability that an intervention will demonstrate safety and efficacy in human beings after a small number of trials). Generally, the most clinically informative studies are performed on live animals that most closely model human disease. Most of what follows pertains to ‘efficacy studies’.

Many ethics policies1 ,2 urge preceding human investigations with animal studies. These recommendations stem from the recognition that animal studies ground judgements about risk/benefit. First, they are designed to predict clinical effects, and exposing patients to an unproven and possibly noxious intervention is only defensible where there is justified belief in its promise. Second, they clarify the appropriate application of a drug, like dose or indication. Third, they support interpretation of trials; if drugs prove ineffective, researchers can use efficacy studies to determine why.3

Despite a patent relationship between efficacy studies and human protections, there is little to guide evaluators on assessment. Some might hold that evaluation should fall solely to drug regulators, rather than investigators, sponsors or ethics committees. However, drug regulators have detailed requirements for toxicology studies, but not for most efficacy studies. The Food and Drug Administration (FDA's) guidance on Investigational New Drugs submissions (INDs) states that ‘lack of… potential effectiveness information should not generally be a reason for a Phase 1 IND to be placed on clinical hold’.4 The FDA does have more demanding requirements for cell and gene therapies,5 as well as for products licensed for human use without trials.6 Of course, the FDA can exercise discretion in the preclinical evidence it demands through, for example, pre-IND meetings with sponsors. Second, while the FDA is tasked with assessing risk, its mandate does not extend to weighing the risk against knowledge value. Drug regulators lack authority—and likely the resources and expertise—to balance scientific utilities of a trial against burdens; instead this responsibility is delegated to others. The FDA Guidance for institutional review boards (IRBs) and clinical investigators states ‘21 CFR 56.111(a)(2) requires the IRB to assure that the risks to subjects are reasonable in relation to the anticipated benefits. The risks cannot be adequately evaluated without review of the results of previous animal and human studies’. Many investigators and ethics committees may currently lack the expertise to review preclinical studies; we address this concern below—for now, noting that sponsors, investigators and others also bear moral responsibility for risk/benefit and thus constitute audiences for the structured evaluation process we propose.

Under existing policy, evaluators should weigh risks to subjects against benefits to them (if any) and to the society. Both direct benefits as well as benefits to the society in early phase trials are related to the clinical promise of an intervention. Thus, studies involving greater risk demand a greater belief in an intervention's clinical promise; studies that deliver very small doses (eg, microdoses) can demand less evidence of promise. At the opposite extreme are early phase studies employing active and prolonged exposures or involving invasive delivery. Even in the presence of clinical need, very strong evidence of efficacy is needed to justify trial risks. Somewhere between are studies where there is limited clinical evidence of promise. Consider drugs that have reached phase 2. Phase 1 studies will have supplied limited evidence of safety and perhaps pharmacodynamic evidence of activity. However, phase 2 studies deliver unproven drugs to larger cohorts of patients, often at doses close to the limit of tolerability;7 often they are launched in medical indications that have not been tested in phase 1. Despite having advanced into middle phases of testing, the moral justification of risk continues to require an assessment of animal studies.

Reviewing efficacy studies

Though a review of efficacy studies is technically demanding, there are widely shared and cardinal considerations that evaluators can use. These derive from an understanding of what efficacy studies set out to demonstrate, and ways that they can mislead.

We propose a two-step, evidence-based process whereby reviewers might use preclinical efficacy studies to support inferences about clinical promise. In the first step, reviewers should use efficacy studies to estimate the clinical promise of a new drug. In the second step, reviewers should adjust their estimates based on clinical outcomes with other related drugs pursued on similar evidence (reference class outcomes). All else being equal, the stronger the clinical promise, the stronger the moral justification for riskier trials. Estimates of clinical promise can then be joined with other moral judgements in evaluating risk and benefit.

Step 1: evaluating clinical promise using efficacy evidence

Preclinical efficacy studies set out to provide evidence of clinical promise. They do this by supporting inferences that a candidate intervention causes treatment effects in a model setting, and that these cause-and-effect relationships generalise to clinical settings. All else being equal, interventions that cause larger effects in animals have greater clinical promise. However, inferences about clinical promise can be threatened in three different ways, and evaluation of preclinical studies begins by determining the extent to which these threats have been addressed during study design and execution.

First, preclinical efficacy studies may introduce biases or random errors that lead to spurious causal inferences; these are called ‘threats to internal validity’. For instance, unless blinded to treatment allocation in an experiment, investigators committed to a hypothesis can unknowingly attend more closely to therapeutic responses in treated animals. Second, efficacy studies can fail clinical generalisation if investigators mischaracterise the relationship between a preclinical study and an ensuing trial. These mischaracterisations are called ‘construct validity threats’. A researcher might mischaracterise a drug's utility for chronic human disease if it was tested only on animals during acute illness. Third, efficacy studies can fail clinical generalisation because there is something idiosyncratic about test conditions that fail to carry over to trials. These artefacts represent ‘threats to external validity’. A drug might fail to generalise because of differences in animal physiology or laboratory practices that interact with treatment effects.

The first task in assessing the clinical promise begins with evaluators thoroughly retrieving all available and relevant efficacy studies on an intervention (though investigators often reference the preclinical evidence in trial brochures, this information can be abbreviated and selective). Evaluators should then determine the magnitude of treatment effects, and interrogate preclinical studies to determine the extent to which preclinical efficacy studies have addressed all three threats to valid clinical generalisation. We recently performed a systematic review of guidelines for designing clinically generalisable efficacy studies, and identified 26 guidelines pertaining to 11 broad disease areas, such as neurological and cerebrovascular diseases or cardiac and circulatory diseases. We also produced a minimal set of 14 consensus recommendations for the three types of validity threat.8 If a trial involves a disease area addressed by a focused guideline, evaluators can evaluate the clinical promise for a given intervention based on whether methods in the guideline(s) have been implemented. Alternatively or in combination with specific guidelines, reviewers can tap the collective wisdom of the preclinical research community and draw on our 14 consensus recommendations.

In particular, reviewers should assess the extent to which efficacy studies have minimised threats to internal validity by asking at least the following six questions:

(1) Was the sample size sufficient to exclude random variation as an explanation of effect sizes?

(2) Were animals randomly assigned to treatment?

(3) Were outcome assessors blinded?

(4) Do studies explain the flow of animals from inclusion through to analysis?

(5) Were appropriate controls included?

(6) Was a dose–response relationship demonstrated?

Reviewers can address the extent to which construct validity threats were addressed by at least examining whether, (1) animals were characterised at baseline and included based on prespecified eligibility criteria, (2) animal models used are widely believed to offer the most faithful and available representation of the human disease that will be tested in trials, (3) studies establish a mechanism of action, (4) outcomes used in studies are good representations of clinical outcomes used in trials and (5) animal age is matched to age of patients (often, it is possible to scale an animal's age to that of a human being, based on developmental milestones, longevity, etc; see, for instance, ref. 9).

The extent to which external validity threats have been addressed can be assessed by determining whether efficacy studies were replicated in (1) more than one model, (2) more than one laboratory and (3) more than one species. The importance of testing in more than one system is underscored by the fact that animal models are often flawed representations of human disease, and may capture a narrow set of disease phenomena. Observing activity in multiple models provides greater confidence that the activities are robust and address more fundamental disease processes.

Failure to implement any one of the above practices in the context of otherwise promising effect sizes should qualify a reviewer's confidence that activities in animal studies will generalise to patients.

Step 2: combining with reference class evidence

At this point, a reviewer will have formed an opinion about a drug's clinical promise using efficacy evidence alone. However, drug candidates backed by strong animal evidence frequently fail human trials; conversely, drug candidates backed by weak animal evidence may succeed in well-understood realms (eg, ‘me too drugs’). Often, this is because animal models or the circumstances of their use are poor representations of the clinical scenarios. The second step is to adjust assessments of clinical promise from efficacy studies by examining how other agents in the same class—and supported by similar evidence—have fared in clinical development. If most drugs in the same reference class have been successfully translated on similar evidence (a sign that animal models used are predictive), this provides grounds for confidence that responses in animals will be recapitulated in human beings. If most drugs in the same reference class have not demonstrated efficacy in trials (a sign that animal models used are not predictive), this provides grounds for caution in projecting clinical responses. For instance, imagine researchers are pursuing a cancer drug that inhibits a particular receptor. If several drugs targeting the same receptor were supported by similar preclinical evidence, but failed translation, reviewers should be more guarded in their estimation of clinical promise. A discussion of this step—and its relationship with models of disease—is described elsewhere.10

The next task is to feed estimates of clinical promise into evaluations of risk/benefit. A thorough account of this task is well beyond the scope of this paper, and has been discussed by other thoughtful commentators, including in early phase clinical development.11–13 We, nevertheless, offer four broad factors that should inform the moral evaluation of risk/benefit. The first is the moral justification for risks of an intervention. If an intervention implicates care obligations (eg, an intervention will be substituted for an established effective therapy), then the ethical permissibility of its risks is to be judged on whether the clinical promise is competitive with the standard of care. In the previous work, one of us has argued that the moral justification for exposing patients to novel interventions in early phase studies will generally rest on the value of the knowledge a study is expected to produce, not therapeutic value.14 This then prompts a second consideration: risk to subjects, which is related to subjects’ medical options and expected research burdens. A third consideration is the degree to which a research programme opens up new avenues for investigation. For highly novel intervention platforms or targets, trials can often turn up findings that are applicable across other translation efforts even if they fail to recapitulate preclinical studies. A last factor is the expected value to healthcare systems should the intervention translate successfully. Studies involving interventions that are believed to have transformative potential—or that address priority unmet health needs—permit greater subject risk than those pursuing incremental innovations.12 ,13 Our proposal for evaluating risk/benefit using efficacy studies is depicted in figure 1.

Figure 1

Process for assessing clinical promise and risk/benefit during trial evaluation.


We offer a hypothetical trial proposal to briefly illustrate how our approach might be applied. A researcher proposes a phase 1 trial investigating a novel cell therapy strategy for the treatment of myocardial infarction (MI). A systematic retrieval of preclinical studies turns up one that showed large responses in a ‘gold standard’ swine model of MI. The swine study was randomised and blinded, demonstrated the mechanism of action, and used appropriate endpoints. However, the study used a small sample and has not been replicated, and it used a surrogate endpoint—reduction in infarct size—rather than a clinically relevant measure of efficacy. The evaluator concludes her review of preclinical evidence with a somewhat qualified level of confidence in clinical promise. Next, the evaluator sets this confidence against past performance of cell therapies in MI, which have generally shown safety, but clinically marginal benefit.15 The evaluator adjusts her confidence in the clinical promise still downwards. She now feeds this into her evaluation of risk/benefit, which considers the expected burdens of the trial, the magnitude of unmet need for MI treatments, the medical options available for eligible patients and the overall scientific fecundity of this line of investigation. An unfavourable judgement might lead the evaluator to condition acceptance on the completion of more rigorous and favourable preclinical studies or to employ a less aggressive research strategy (eg, starting at lower doses or enrolling patients with more advanced disease). If the evaluator is a clinical investigator, it might lead her to refuse to join the protocol and enrol patients.


Our proposal offers a structured approach to using efficacy studies when assessing risk/benefit for trials. By combining all accessible efficacy studies with reference class information, it capitalises on the totality of evidence. In these respects, it offers a systematic and relatively transparent method for incorporating preclinical findings into the moral evaluation of trial risk. It also removes some of the arbitrariness that likely plagues a process for which there are no established frameworks.

It might be objected that ethics committees lack the expertise to perform these evaluations. We stress that our analysis is intended to apply to any stakeholder who bears moral responsibility for maintaining a favourable risk/benefit balance in trials—including public funders, private sponsors and trial steering committees—who surely have the expertise. Insofar as ethics committees have a role in refereeing such judgements, we suggest that the process of review can be facilitated by presenting study sponsors with a short template asking them to address each element listed in appendix 1; such templates are similar to those used in submission of manuscripts to biomedical journals like Nature.16

Such assessments might also be delegated to others. For instance, institutions overseeing early phase trials might establish bodies that provide ‘special scrutiny’; they might include nonvoting ad hoc expert reviewers in deliberations,17 or they could require that sponsors proposing early phase protocols submit an expert and independent scientific review of the preclinical evidence. The prescription to maintain a favourable risk/benefit balance in trials should not rest solely on the shoulders of ethics committees. Our analysis is not aimed at defining appropriate institutional mechanisms; in any event, these will vary depending on contextual factors of review.

The approach has important limitations and demands further refinement, of course. First, we addressed only the first step in risk/benefit evaluation—namely, assessment of clinical promise. As already described, this judgement must be combined with other judgements to morally evaluate the risk/benefit. Second, the approach should not be applied mechanically. Experimental practices in step 1 do not exhaust all factors that threaten the strength of evidence; also, some practices matter more in some settings than others (eg, non-randomisation is a greater liability in studies involving outbred animals). Third, reviewers should be alert to the fact that preclinical research is afflicted with publication and reporting bias,18 and some efficacy studies may be impossible to access. Given publication biases in preclinical research, evaluators should approach investigators and/or sponsors for preclinical efficacy evidence that might not be reflected in the study brochure. Fourth, the approach has not been rigorously validated. Considerations enumerated in step 1 are derived from a systematic review, and many of these recommendations are supported by evidence.19 Nevertheless, there are established methods for validating guidelines, and the proposal above should be put to such a process. Fifth, our approach assumes that consensus recommendations contained in preclinical guidelines have identified the most pressing validity threats. It is possible these guidelines overlook larger or more pressing validity threats. Last, our approach to reviewing preclinical studies is embedded within a potentially flawed, standard research ethics framework. In this framework, review committees and investigators are not instructed to contemplate opportunity costs or externalised costs of investigation. Nor are they asked to contemplate the knowledge value of a protocol detached from subject risk.20 As a consequence, the approach we have described could sanction resource intensive, but marginally informative early phase studies provided they do not expose subjects to undue risk.

Investigators and reviewers are charged with promoting a favourable balance of risk/benefit in trials, and this process begins with a systematic assessment of clinical promise. Nowhere is this task more challenging than in early phase trials. Fortunately, preclinical researchers themselves have articulated criteria for determining the strength of evidence of clinical promise. Our approach absorbs this collective wisdom into a structured method for assessing the probabilistic component of risk/benefit in early phase trials.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • ▸ Additional material is published online. To view please visit the journal (

  • Contributors JK and VH conceived of the manuscript and discussed its general outline. JK wrote the first draft. VH revised. Both authors attest to their substantive contributions to the manuscript.

  • Funding Canadian Institutes of Health Research (grant no. EOG111391).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.