Article Text


Ethics and evidence based surgery
  1. G M Stirrat
  1. Correspondence to:
 Professor G M Stirrat
 Emeritus of Obstetrics & Gynaecology and Senior Research Fellow in Ethics in Medicine, University of Bristol, 73 St Michael’s Hill, Bristol BS2 8BH, UK;


Traditionally, surgical practice has been experiential and based on the contemporary understanding of basic mechanisms of disease. It was both a science and an art and depended to far too great an extent on the individualism and self belief of its main exponents. “Evidence based medicine” (EBM) emerged in the 1980s and a new gospel of “Rules of Evidence” was introduced. There is no doubt that the net effect of EBM has been beneficial, but over reliance on randomised controlled trials and the lack of generalisability of scientific evidence to individual patients has perhaps led to less enthusiasm for its tenets among surgeons. There are valid and spurious reasons for this that are discussed. The situation is improving but inevitable tensions remain between the surgeon committed to the individual patient here and now, and the clinical researcher whose focus is the benefit of future patients in the larger community.

Statistics from


Evidence is not that which the mind does or must yield to, but that which it ought to yield to. John Stuart Mill, Logic III xxi (1846)

This article is predicated on four linked ethical imperatives. The first is that all medical practitioners must make the interests of their patients paramount. The second is that “any recommendation to a patient, a colleague, or those third parties to the doctor-patient relationship such as economists, lawyers, insurers, or hospital managers must be supportable on (best available) evidence”.1 The third imperative is that all new interventions and procedures must be properly compared with the currently accepted method(s).2 The fourth is that those who do not fulfil the first three must be held to account. Consideration of the second and third are the main subject of this article but some attention will also be paid to the first.


Meakins has reminded us that the development of surgical practice has traditionally been largely experiential.3 “We have”, he says, “been trained in a hierarchical environment where the professor or chief might define the way in which not only most clinical situations were to be managed but also how the operation was to be done”. This is consistent with Garry’s view that “after more than 100 years of experience of the most commonly performed major surgical operation in the world, the gynaecological profession as a whole has no clear indication of the optimum method by which to perform a hysterectomy in differing situations”.4 He refers to the summary of the situation by Johns et al, “the route of hysterectomy is usually determined by the skill, experience and preferences of the operating gynaecologist. Few other parameters matter”.5 As Wood1 points out, “Many surgical procedures and other therapies are considered standard therapy without ever having been subject to rigorous evaluation” and new operations have appeared without rigorous scrutiny or comparison with currently accepted methods.3 The traditional paradigm within which surgery developed is outlined in box 1.

Box 1 Development of guidelines for surgical practice: traditional paradigm

  1. Founded on the study and understanding of basic mechanisms of disease and principles of pathophysiology.6

  2. Based on clinical experience and individual surgical expertise.6

  3. Published series: one surgeon’s experience of a series of patients treated by his new procedure was compared to previously published series of another operation. Better outcomes were attributed to the new procedure when they were probably due to nothing more than biased comparisons between different populations with a multitude of other differences such as age, stages of the condition (or even different diseases), criteria for treatment and measured outcomes.1 Surgeons sought to “legitimise their enthusiasm by comparing personal results, in cases chosen by themselves and operated on by experienced consultant surgeons committed to the task”.7

  4. Series using historical controls: the surgeon compared the results of a new operation with those previously obtained in his hospital using another procedure. Open to serious bias due to assumption that nothing has changed except the new procedure. Incorrect conclusions can be reached in 40 to 60% of such studies.8 May occasionally be useful but only if the new procedure produced dramatic improvements in outcome.1

  5. Series using concurrent, non-randomised controls: this too is liable to operator and population sampling biases.

  6. Randomised controlled trials: these were uncommon and often carried out with great difficulty usually some considerable time after the introduction of the procedure. For example, the use of gastric freezing for duodenal ulcer was introduced in 1962 but not discarded until 1970 after a randomised trial by Ruffin et al9 showed a significant risk of gastric gangrene.


Considered in isolation, the above analysis seems rather critical of surgeons and their practice. Before rushing to such judgement it would be wise to try to understand the context in which the paradigm discussed above developed. In East Coker, the second of his Four Quartets, TS Eliot10 communicates the art, science, craft, and commitment of the surgeon:

The wounded surgeon plies the steel
 That questions the distempered part
 Beneath the bleeding hands we feel
 The sharp compassion of the healer’s art,
 Resolving the enigma of the fever chart.

The late Richard Porter reminds us, “For thousands of years surgery had been a business of boils and broken bones, hernias, venesection and the occasional amputation”.11 The factors that placed severe limitations on what could be successfully achieved were lack of understanding of the nature and causes of disease, pain, and infection. John Hunter (1718–93) has correctly been called “the true founder of scientific surgery” because his clarity of inductive and deductive reasoning made him strive “to link structure and function and to know not only the diseases but their causes”.12 In the 18th century this was an even greater paradigm shift than that associated with the introduction of evidence based medicine at the end of the 20th century.

Anaesthesia was to prove the catalyst for developments in surgery in the decade from 1850 and another surgical giant, Joseph Lister, reported successful antisepsis using dilute carbolic acid in the Lancet in 1867.12 The survival with intact limbs of nine out of 11 patients with compound fractures, so treated when amputation had previously been inevitable and death likely, did not require a randomised trial to demonstrate its efficacy! Between 1877 and 1893 trauma and orthopaedics (much of it as a result of tuberculosis) dominated Lister’s practice and he did not attempt abdominal surgery until after 1893.11 Porter11 describes Theodor Billroth (1829–94) as “the Columbus of the new surgical techniques” and his surgical innovation was derived from studying the underlying pathophysiology of, for example, wound healing, inflammation, and haemorrhage. His technique was described as superb and his temperament dauntless. The latter is code for the fact that “his new methods sacrificed many lives but, as his practices became refined and postoperative care improved, mortality rates dipped”!11 Like many of his surgical contemporaries his self belief was sufficient to drive him on, despite the initial deaths, and prevented any doubt about the ethics of his practice. The issue of the surgical “learning curve” is still with us. The first two, and to some extent the third, criteria described in box 1 were now established, but why did the traditional paradigm of surgical practice continue until the latter part of the 20th century? Possible reasons can be considered in the context of ethos and circumstances as in box 2.

Box 2 Suggested reasons for persistence of the traditional paradigm


  1. Surgery involves action and, therefore, surgeons “do things”. Those attracted to the specialty (predominantly men) may have tended to be less reflective than physicians.

  2. Succeeding generations of surgeons learned techniques, skills and attitudes by apprenticeship with a consultant or chief whose authority was difficult to question.

  3. The vast majority of surgeons felt sincerely (and some were totally convinced) that they always acted in the best interests of their patients. In light of this what more was required?

  4. Clinical practice focussed on the individual patient and was, therefore, less well equipped to see him/her within a community.

  5. Surgery is, by definition, an invasion of the patient’s bodily integrity and is “all or nothing” (one cannot do half an operation after all!). In order to justify the procedure to himself and the patient, the surgeon had to travel further along the road of self belief than his physician colleague.

  6. Having to confess uncertainty was perceived as undermining the patient’s confidence in the surgeon.

  7. A surgeon gained personal kudos and, sometimes, private practice, from his expertise and innovation. These might be threatened by rigorous testing.13


  1. Historically most surgery was in response to acute clinical problems. This was especially so as a result of:

    • the industrialisation and urbanisation of society

    • the two World Wars and other armed conflicts

    • emergency surgery being difficult to assess systematically.

  2. Elective surgery only became normative within the last one to two generations (but is currently in retreat again in the UK due to capacity problems in the NHS).

  3. Established patterns of practice in each succeeding generation of surgeons were difficult to change.

  4. The incontrovertible and self evident success of some new interventions even in the 20th century, such as blood transfusion and antibiotics, reinforced the received wisdom.


The development and application of the discipline of epidemiology—“that branch of medicine concerned with describing and explaining the occurrence of disease in populations”14 was, in my opinion, the key that opened the door to a new paradigm of healthcare delivery. The specific catalyst was the application of epidemiological methods to clinical practice “so as to evaluate the natural history and outcome of illnesses and the performance of diagnostic tests and treatments”.14 In my own specialty of obstetrics, the setting up in Oxford of the National Perinatal Epidemiology Unit (NPEU) in 1978 by the Department of Health was crucial, not only for improvements in perinatal care but as a model for the subsequent establishment of the Centre for Evidence-based Medicine and the UK Cochrane Centre, both also in Oxford, in the early 1990s.


A new paradigm for medical practice emerged in the 1980s (mainly in North America and the UK) under the overarching title of “evidence based medicine” (EBM). Sackett, one of its main exponents, has defined it as “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients”.15 It has been well described in earlier articles. This new paradigm puts a much lower value on authority than previously. The underlying belief is that clinicians can gain the skills to make independent assessments of evidence, and thus evaluate the credibility of opinions being offered by experts.


Géronte: It seems you are locating them wrongly: the heart is on the left and the liver is on the right. Sagnarelle: Yes, in the old days that was so, but we have changed all that, and we now practise medicine by a completely new method. Moliére. Le Médecin sans malgré lui (1667)

Once a new paradigm is adopted, it tends to be embraced enthusiastically and uncritically and then gradually finds its proper place, only to be replaced at some time in the future with yet another. Black has suggested, “Although EBM clearly has a place (in surgery), it does not have all the answers”.16 Among the shortcomings he suggests are the over reliance on randomised controlled trials (RCTs), the lack of generalisability of scientific evidence to individual patients, the lack of attention to third party interests, the threat to the “art” of medicine, and the dangers of an over simplistic approach. The first of these is considered further below. The second is a reversal of the previous situation (see box 2) where clinical practice was almost totally patient centred and there is a danger that we have gone too far the other way. There are two different aspects to this issue. The first is empirical in that “RCTs are conducted with a heterogeneous groups of patients and the trial results represent an estimate of the average difference in the responses of the treatment group”.17 Thus trial evidence, however well randomised, suggests only what was more effective for that group of patients with a particular condition, and not necessarily for Jennifer Smith sitting in your surgery or outpatient clinic. Other variables have to be considered in that context. The second is a matter of policy, in that one of the main driving forces behind EBM is the justifiable wish to allocate resources to those interventions that are effective, and withdraw them from those that are ineffective.15 This can never be divorced from the lack of resources allocated for even demonstrably effective interventions. The argument that “more evidence is required” can be used to screen the true reason—lack of resources. Thus, the theory that EBM ultimately works in the best interests of patients does not readily translate into the individual consultation: “Yes, Mr Jones, I agree that the evidence suggests that you would benefit from X but I regret that those purchasing health care in this locality consider that more evidence is required and, anyway, do not have the resources to buy X. Sorry!” Eypasch noted the potential conflict between the benefit accruing to the patient from consideration of the currently available best scientific evidence and some inherent serious limitations of EBM.18

The decreased emphasis on authority was not intended to imply a rejection of what one can learn from colleagues and teachers whose insights from years of experience can never be gained from formal scientific investigation. However, some of the more evangelical advocates of EBM (particularly those not involved in acute medicine or surgery) seem to discount clinical experience or understanding of pathophysiology altogether. It is, of course, true that the processes of EBM correctly and necessarily allow us to question our clinical practice and those being trained in the surgical specialties should learn and embrace its principles. Accumulated clinical experience can be used to inform, or at least interrogate, EBM. Surgical trainees who, for a variety of reasons (at least in the UK), are becoming consultants with considerably less direct surgical experience than their predecessors, are less able to place the evidence in the context of experience (or vice versa).


Buchwald, himself a surgeon, argues that “surgical procedures and devices should be evaluated in the same way as medical therapies, namely by RCTs”.19 Daya states, “There is a growing consensus that the results of RCTs provide the most secure basis for valid inferences about the effects of treatments. However, although they have the characteristics of a true experimental design, RCTs pose several unique challenges”.20 This issue is discussed further later. In 1985 Salzman found that, between 1940 and 1980, RCTs were reported as being used in only 10–20 per cent of studies evaluating surgical practice.21 Have things changed? It is difficult to say given the poor quality of evidence by any, let alone EBM, standards. (Should an RCT be done on the use of RCTs?) Solomon and McLeod reviewed all clinical studies published in three surgical journals in 1980 and 1990 and concluded that, by 1993, there had been no overall increase in the proportion of stronger clinical trial designs.22 Ko et al found that, by 2000, the number of RCTs and the quality of reporting for diseases of the colon and rectum had improved.23 Howes et al audited the treatment of 100 surgical patients admitted under two consultants at Liverpool teaching hospital.24 They categorised evidence as (1) supported by RCTs; (2) sufficient other evidence “to make a placebo-controlled trial unethical”, or (3) neither of the above. The treatment of 24 and 71 patients was in categories 1 and 2 respectively and was, therefore, deemed to be “based on satisfactory evidence”. They concluded that, in their experience, inpatient surgery was evidence based but the proportion of surgical treatments supported by RCTs was “much smaller than that found in general medicine”. In 1998 Beger and Schwarz reported infrequent use of controlled clinical trials for answering clinical questions in Germany. It was particularly poor for surgery.25 Millat et al carried out a survey among a random sample of 152 general and gastrointestinal surgeons in France.26 They concluded that “French surgeons particularly those aged 50 or over, are not well informed about the nature, conduct, and value of RCT. Most of their information is acquired through reading and attending scientific meetings and congresses. Surgeons tended to attach more importance to the fame of the author than to the conduct of the study. The overall impact of RCT was weak among the surgeons questioned”. Mildon et al found a similar situation among cataract surgeons in British Columbia.27 Hardin et al28 and Moss et al29 carried out literature searches for evidence based practice in paediatric surgery. They both concluded that clinical trials were used infrequently but the former reported an increase in prospective case controlled studies and RCTs in the 1990s. The latter found that, when RCTs were used, they often suffered from poor trial design, inadequate statistical analysis and incomplete reporting. Kenny et al30 carried out an identical audit to that of Howes et al24 for paediatric surgery also in Liverpool. Of 281 interventions 11 per cent were based on “controlled trials”, 66 per cent on “convincing non-experimental evidence”, and “only 23 per cent” were without substantial supportive evidence. Their rather complacent conclusions are that “in common with other medical specialities” (no evidence adduced) “the majority of paediatric surgical interventions are based on sound evidence”. They do not seem overly concerned about the 65 treated using interventions “not based on sound evidence”! They also suggest that lack of RCT data may be a reflection on the nature of surgical practice. That question is considered below.

There is at least circumstantial evidence for believing that the situation will improve. For example, the National Institute for Clinical Excellence (NICE) in England and Wales and similar bodies elsewhere use EBM principles to discern “whether interventional procedures used for diagnosis or treatment are safe enough and work well enough for routine use”;31 national clinical guidelines are increasingly evidence based; the major medical journals are encouraging submission of reports of evidence based studies and at least seven evidence based journals have been established; new generations of textbooks are becoming more evidence based; medical students are learning EBM, and professional training courses teach EBM and qualifying examinations set questions based on it.


There are several spurious reasons for the rejection of RCTs by surgeons.2 Among them are unjustifiable self belief, unwillingness to confess uncertainty, “surgical RCTs are too difficult”, ignorance about rules of evidence, misunderstanding of what RCTs are about, unwillingness to participate unless patients are allocated to the doctor’s preferred treatment and concerns that EBM is all about cost containment and, therefore, to be resisted. There are, undoubtedly, some valid concerns that need to be addressed. Meakins,3 himself a surgeon, considers that “the framework of how to evaluate and test surgical therapies, indeed most technical acts, has not been well defined and may be very different from the approach to the assessment of a new drug”. His “central hypothesis is that the rules of evidence are different for surgery and that they require clear definition and an intellectual framework into which the evaluation of innovation and the progress of the field can be placed”. He argues, with some justification, that “the dogmatism of the hierarchy (of evidence) suggests that there is no other way of defining a recommendation” and he questions whether these hard rules of evidence should be universally applicable to surgery and other “procedural specialties”. He refers to situations where an RCT was entirely appropriate, such as studies comparing operations for breast cancer and carotid end-arterectomy versus medical management. However, where some therapeutic intervention is required and the options are limited, a RCT would be inappropriate (for example, resection of cancers, drainage of an abscess, a perforated viscus, a ruptured aneurysm, or fracture). He also considers that carefully performed observational studies with appropriately defined measured and documented outcomes can be appropriate for quality of life conditions such as hip replacement or breast reduction surgery.

A detailed critique of RCTs is not within the scope of this article. Simon32 does this usefully when he asks the question “is the RCT the gold standard of research?” While acknowledging that, where feasible, RCTs are the best way to assess the outcomes and safety of all medical interventions, I wish to review some issues with clear ethical implications for surgical research.


For it to be ethical to recommend involvement in a clinical trial, there must be genuine uncertainty about the benefit or harm from an intervention or about the relative merits of alternative treatments. Both surgeon and patient must share this equipoise. If a surgeon considers he knows the preferred option, even if he has no grounds for doing so, the patient will not be offered the chance of entering the trial. One trial the value of which was, in my opinion, reduced by this phenomenon was the trial of cervical cerclage in the management of suspected cervical incompetence.33 Patient preference, in the absence of any real evidence, will have a similar effect. Although those included in a trial will not show population bias, its value may be reduced by the lack of participation of those who were, in fact, eligible. Wennberg has suggested that a “preference trial, may, on some occasions be preferable to a RCT”.34 This is the “systematic follow up of patient cohorts where treatment assignments are made according to informed [author’s italics] patient choice rather than by randomisation”. How then is the information to be gathered to inform the patient’s choice?

Surgeon centred issues

There are several surgeon centred issues that inevitably impinge on the assessment of a new procedure. Meakins3 asks, “In the establishment of a new procedure, when should the RCT be started? Can the initiators do the RCT, or does another group? If so, when on the learning curve?” Given the variability in proficiency and technique among surgeons, the necessary standardisation of operative technique is problematic.35 He proposes that new procedures should, firstly, be assessed by a systematic review of the problem and its management. It would then be subject to a prospective non-randomised trial (from the first patient). He considers that “the non-randomised trial will be the lynchpin of the knowledge development for innovative solutions to surgical disease”. This is discussed further below.

Blinding and the placebo effect

Interest in the use of “sham” or placebo surgery in RCTs has been rekindled by its recent use in cell based therapy for Parkinson’s disease36 and arthroscopic surgery.37 Horng and Miller acknowledge that reasonable people are bound to differ over the ethics of such a controversial practice.38 Their utilitarian argument is that the primary aim of an RCT is to improve patient care in the future and they “are not designed to promote the medical best interests of enrolled patients”. They consider that “the use of placebo surgery must be evaluated in terms of the ethical principles appropriate to clinical research which are not identical to the ethical principles of clinical practice”. They justify this view by reference to a seminal paper by Emanuel et al on the ethics of clinical research.39 The latter propose seven necessary, sufficient, and universal requirements of clinical research. They discuss placebo controlled trials only in the context of a drug trial they consider to be unethical because, in their view, it did not fulfil the necessary requirements. “Sham” surgery is not discussed. Macklin40 and Dekkers and Boer41 consider that the sham surgery for Parkinson’s disease was morally unacceptable. The latter suggest “the notions of therapeutic misconception and of the integrity of the body, and the difficulties in assessing the balance between risks and benefits provide strong arguments against sham surgery, but are in themselves not decisive”.41 The clinching argument they adduce against sham surgery is that there was, in their opinion, an alternative, less harmful research design that could have provided comparable empirical evidence. The argument cannot be settled here but it is clear that tensions are inevitable between the sincerely held views of the clinical researcher and surgeon.


Such evaluation has traditionally been the responsibility of individual surgeons as described in box 1. This is no longer clinically and ethically acceptable because it fails all but the first of the four ethical imperatives underlying this article.

Evaluation will involve several levels of activity covering audit, systematic review, and research protocols. The entire process must fulfil several criteria among which are that it must be nationally based (but linkable internationally), effective, efficient, rigorous, objective, and as comprehensive as possible. In England and Wales responsibility for the evaluation of interventional procedures has been devolved by the Department of Health to the National Institute for Clinical Excellence (NICE).31 In Australia this function is carried out by a Safety and Efficacy Register of New Interventional Procedures - Surgical (ASERNIP-S) nested in the Royal Australian College of Surgeons.42 As a baseline all existing interventional procedures should be registered and reviewed. Thereafter all “new” procedures (that is, those that are innovative or significantly different from those currently practiced) should be evaluated. Submissions should be voluntary but the Royal Colleges in the UK and their equivalent elsewhere should make it clear that their members are expected to comply. Confidentiality must be ensured. In light of the commercial imperatives behind many new interventional procedures, those carrying out the evaluation need to be properly indemnified. It has been suggested31 that clinicians who wish to undertake a new procedure between notification and the issuing of guidance should inform the chief executive of their Trust or hospital of their intention; inform patients of the status of the procedure and the uncertainty around its safety and efficacy; and consider seeking advice from the local research ethics committee. The process by which NICE develops guidance on an interventional procedure starts with notification, and involves overview, initial independent review, followed by a systematic review if deemed necessary. Consultation documents are produced, culminating in the issuing of guidance on the procedure to the NHS in England and Wales. For Campbell and Maddern43 (from NICE and ASERNIP-S respectively), “Success requires a balance between the primary aim of protecting patients and the need to encourage and foster innovation”. They point out that the system is so expensive that “funding is unlikely ever to be sufficient for collection of data on all procedures” and remind us that safety and efficacy require a long term perspective. More traditional research studies will, of course, also be necessary. On some occasions these can be RCTs. As previously noted, Meakins3 suggests that carefully performed observational studies with appropriately defined measured and documented outcomes are appropriate for many surgical intervention. In addition he proposes that new procedures should, firstly, be assessed by a systematic review of the problem and its management followed by a prospective non-randomised trial (from the first patient). This may be desirable but is it achievable in practice?


All surgical and other interventional procedures must be subjected to rigorous, objective, and—if possible—prospective evaluation. The contribution that EBM can make to this is acknowledged, but its simplistic and uncritical application to surgery is ultimately not beneficial to the individual patient.


View Abstract

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.