Article Text

Download PDFPDF

Systematic reviews of empirical bioethics
  1. D Strech1,2,
  2. M Synofzik1,3,
  3. G Marckmann1
  1. 1
    Institute for Ethics and History in Medicine, University of Tübingen, Tübingen, Germany
  2. 2
    Department of Bioethics, National Institutes of Health, Bethesda, Maryland, USA
  3. 3
    Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
  1. Dr medDr phil Daniel Strech, Institüt für Ethik und Geschichte der Medizin, Universität Tübingen, Schleichstraße 8, 72076 Tübingen, Germany; daniel.strech{at}


Background: Publications and discussions of survey research in empirical bioethics have steadily increased over the past two decades. However, findings often differ among studies with similar research questions. As a consequence, ethical reasoning that considers only parts of the existing literature and does not apply systematic reviews tends to be biased. To date, we lack a systematic review (SR) methodology that takes into account the specific conceptual and practical challenges of empirical bioethics.

Methods: The steps of systematically reviewing empirical findings in bioethics are presented and critically discussed. In particular, (a) the limitations of traditional SR methodologies in the field of empirical bioethics are critically discussed, and (b) conceptual and practical recommendations for SRs in empirical bioethics that are (c) based on the authors’ review experiences in healthcare ethics are presented.

Results: A 7-step approach for SRs of empirical bioethics is proposed: (1) careful definition of review question; (2) selection of relevant databases; (3) application of ancillary search strategies; (4) development of search algorithms; (5) relevance assessment of the retrieved references; (6) quality assessment of included studies; and (7) data analysis and presentation. Conceptual and practical challenges arise because of various peculiarities in reviewing empirical bioethics literature and can lead to biased results if they are not taken into account.

Conclusions: If suitably adapted to the peculiarities of the field, SRs of empirical bioethics provide transparent information for ethical reasoning and decision-making that is less biased than single studies.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Empirical bioethics—understood as the application of social science research methods to examining bioethical issues1—has become a promising approach to provide important empirical data that can inform ethical reasoning and theoretical analyses. Accordingly, the number of empirical studies published in bioethical journals has steadily increased over the past two decades.2

As a consequence, we often find several empirical studies that investigate similar research questions but use different qualitative research methodologies (focus groups, in-depth interviews) and different quantitative survey instruments. This methodological diversity might result in varying research findings and hence incoherent conclusions. For example, several qualitative and quantitative studies have focused on the topic of healthcare rationing (HCR). They provide findings regarding physicians’ attitudes by interviewing physicians about the strategies they use for implicit or explicit bedside rationing, influencing factors in the process of cost containment, experiences of role conflicts and consequences for the patient–physician relationship. Comparing these studies, we found that they are methodologically rather heterogeneous and do not provide a coherent pattern of conclusions. For instance, qualitative studies differ in the range of themes and concepts reported,36 while quantitative findings differ in the relative importance of preferences for certain prioritisation criteria or role conflicts with regard to HCR.710 As the methodological complexity and variability in interview research on HCR and other ethical issues increase, so does the need for a systematic and transparent presentation, synthesis, and interpretation of the results.

To date, however, no attempts have been made to outline the major challenges and opportunities in summarising empirical research findings with relevance for bioethics. The fact that McCullough and colleagues recently presented a methodology for systematic reviews of conceptual or argument-based ethical publications underlines the increasing awareness in the field for the need of synthesising the results of the growing body of bioethical publications.11 In general, SRs aim to summarise large bodies of evidence and help to explain different results among studies addressing the same research question. SRs require the transparent application of scientific strategies to limit the bias inherent in retrieving, critically appraising and summarising the relevant studies that address a specific empirical question. Because the review process itself is subject to bias, a sound review requires explicit reporting of information and the application of systematic methods. During the past two decades, SRs of clinical trials have been increasingly used to inform medical decision-making, plan future research agendas and establish healthcare policies.12 However, traditional methods for systematically reviewing research findings (for example, most of the Cochrane reviews13) are limited in several ways regarding their application to survey research in empirical bioethics. For example, traditional SRs usually deal with issues (such as specific diseases and interventions) and study designs (such as randomised controlled trials) that correspond well to the controlled vocabulary of databases such as MEDLINE, EMBASE and others. Choosing search terms for an appropriate search algorithm, therefore, does not pose a big challenge. In contrast, within systematic reviews of empirical bioethics that mostly deal with interview research and rather specific ethical issues, it is much more difficult to find adequate search terms that are represented by the databases’ controlled vocabulary. Because of the heterogeneity of those search terms that are relevant for empirical bioethics and are used by different databases, search algorithms for SRs of empirical bioethics have to be adapted to the databases’ vocabulary to enhance the sensitivity and specificity of literature searches. Further limitations of traditional SRs will be discussed in more detail in the following sections.


In this paper we present and evaluate a new stepwise approach for SRs in empirical bioethics, highlighting the major differences compared with traditional SRs. The underlying questions are, what are the specific requirements of SRs in empirical bioethics and how can they be appropriately addressed? We illustrate our approach with a systematic review of interview research in the field of HCR. Our approach could serve as a reference work for the stepwise process of SRs of empirical bioethics literature and provide a starting point for future projects systematically reviewing other fields of empirical bioethics. Table 1 summarises the stepwise review process and the practical recommendations, which we will discuss in greater detail within the following sections.

Table 1 The 7-step approach to systematic reviews in empirical bioethics

Careful definition of review question

The first important step for a systematic review is the definition of a precise review question. The anatomy of a good clinical question addressed by traditional SRs typically contains four aspects, also known as the PICO model: patient (or problem), intervention (or exposure), comparison and outcomes.14 However, most review questions on empirical bioethics deal with interview research and thus concentrate on different foci. In a review of research methodologies used in empirical bioethics, Borry and colleagues showed that the great majority (92%) of empirical studies published in bioethical journals applied non-experimental study designs and data collection methods, such as quantitative and qualitative interviews and survey research.2 These research paradigms do not focus on interventions or clinical outcomes. Because comparisons and clinical outcomes do not play a role in summarising interview research, the PICO model does not fit to define a review question for current empirical studies in bioethics. The PICO model, however, could be applied, if empirical bioethics would use experimental methods that use comparisons and focus on specific outcomes. One example is psychometric research in bioethics, such as [the validation and application of] questionnaires that try to measure patient decision-making competence (PDMC, or informed-consent research). Future empirical bioethics might also be involved in studying social interventions that deal with comparisons and outcomes. The Campbell Collaboration, for instance, produces, maintains and disseminates systematic reviews of research evidence (randomised trials) on the effectiveness of social interventions (for further information see

In addition, other factors such as the methodology of the empirical study and the participants in the interview or survey research are important determinants for reviews of empirical studies in bioethics. Therefore, we have developed an “MIP” (methodology, issues, participants) model that takes into account specifically the essential aspects of review questions within empirical bioethics: methodology (such as in-depth interviews or questionnaires), issues (such as HCR or end-of-life decision-making), and participants (for example, physicians or patients). Search terms for the systematic search in bibliographic databases should match with these three aspects to guarantee sensitivity and specificity in the retrieval of relevant literature (see also the section about the development of a search algorithm). We recommend limiting the range of methodologies, issues and participants included in one systematic review. While the experiences and attitudes of different stakeholders are all relevant for deciding about bioethical issues, it is not feasible to summarise them in one review because of methodological constraints. The feasibility of SRs of survey research mainly depends on the scope of the review question. For reasons of comparability and practicability, it is necessary to focus on specific methodologies, issues and participants when systematically reviewing interview research. For example, in one of our SRs we focused on qualitative research methods (methodology), HCR and resource allocation (issues), and physicians (participants).15

Selection of relevant databases

Meanwhile, various electronic bibliographic databases are available that include conceptual and empirical bioethics literature. For further information see the National Reference Center for Bioethics Literature ( or the European Information Network ( The selected databases determine which articles will be found.16 In order to reduce the danger of a potential bias, authors of SRs should combine several resources to find all references relevant for the selected review question. Traditional SRs prefer databases that include a wide range of publications of clinical trials, such as MEDLINE and EMBASE. Interview research in empirical bioethics, however, has at least two peculiarities. (1) Interview research is often indexed in databases other than MEDLINE and EMBASE. For example, databases, such as CINAHL, that include psychological and sociological articles are also important. (2) Even more importantly, databases that specifically focus on ethics often provide further relevant references that might be neglected if the literature search is limited to, for example, MEDLINE or EMBASE, even though BIOETHICSLINE is included in MEDLINE. While databases such as CINAHL or PsychInfo have proved to be effective in SRs of interview research, we still lack experience with specific bioethical databases, such as EUROETHICS. Therefore, we will present and compare our findings in the different databases mentioned above in the following sections, which will allow a preliminary assessment of the databases’ specific relevance for SRs in empirical bioethics.

Ancillary search strategies

Traditional SRs sometimes use ancillary search strategies to improve the sensitivity of their search. Common ancillary search strategies include the review of bibliographies from relevant references or manual search of journals that are not listed or not completely listed in the databases used.17 Since databases do not specialise in indexing interview research in bioethics, ancillary search strategies are especially important for SRs on empirical studies in bioethics.

Developing a search algorithm

Determining appropriate search terms for the area of interest is essential for the effective use of bibliographic databases. Search terms should match with the controlled vocabulary used by the relevant databases for indexing references. MEDLINE, for example, uses terms from MeSH (medical subject headings). These headings are the keys that unlock the medical literature.18 Because search terms in traditional SRs typically include specific diseases, medical interventions, and study designs (randomised controlled trials, observational studies) that correspond to the controlled vocabulary of most databases, developing an appropriate search algorithm is not a big challenge.

For SRs of interview research in bioethics, however, the situation is quite different. First, ethical issues are sometimes not adequately represented in the databases’ controlled vocabulary, or the databases use different terms for the same issue. For instance, while the controlled vocabulary of MEDLINE, EMBASE, CINAHL, PsychInfo, and EUROETHICS all include the term “resource allocation”, only MEDLINE (including BIOETHICSLINE) and EUROETHICS include the term “health care rationing”. The same problems occur with terms that describe the participants and the paradigm in interview research. Second, even if we use search terms included in the database-specific controlled vocabulary, we face the practical problem that the databases sometimes do not use these terms to index the relevant references. In our SRs, we found that publications of qualitative or quantitative interview research often are not indexed by appropriate MeSH terms such as “qualitative research”, “focus group”, “survey” or “questionnaire”; even if the database’s controlled vocabulary includes these terms.

With regard to these conceptual and practical challenges, we suggest three strategies that can help to identify the appropriate database-specific search terms and to use them adequately. Each strategy has proven to be helpful in our SRs.

  • Search terms have to be adapted to each database to develop a search algorithm with both good specificity and good sensitivity. Because of the differences in the databases’ controlled vocabulary, the search terms that have been successful in searching references in one database might not be successful in searching in another database.

  • To find adequate search terms one first has to become acquainted with the underlying mapping patterns of each database. Which headings are used by the database for indexing references that are of interest for a certain systematic review (SR)? Reviewers should look for the database-related headings used for indexing those relevant articles that are already known from prior non-systematic literature reviews. We call this strategy index mapping. Subsequently, reviewers should check the controlled vocabulary of each database to find further search terms that are relevant for the review question.

  • Finally, one has to combine the database-related search terms into one search algorithm by using the common Boolean operators “and”, “or” and (if useful) “not”. To balance the need for good sensitivity and specificity, we recommend building three clusters according to the MIP model. For instance, all database-specific search terms that deal with participants (for example, physician’s role, physician attitudes) should be combined by the Boolean operator “or”. Finally, the three clusters have to be combined by the Boolean operator “and”. We call this strategy cluster modeling. To illustrate this strategy, the database-related search algorithms and numbers of retrieved references in our SR are presented in table 2.

Table 2 Database-specific search algorithms

In our SR of qualitative interview research about HCR, the bibliographic search of five databases resulted in 614 references. Of the total number of references, 46% (283) were retrieved in MEDLINE, 31% (193) in EMBASE, 9.6% (59) in PsychInfo, 4.7% (29) in CINAHL, 3.4% (21) in EUROETHICS and 4.7% (29) by ancillary search strategies. There was an overlap in 6.4% (39) references: 18 were retrieved both in EMBASE and MEDLINE, 12 both in PsychInfo and MEDLINE and nine in both CINAHL and MEDLINE. After eliminating the overlapping results, the total number of references was 575.

Relevance assessment of the retrieved references

Traditional SRs typically search for studies that involve distinct outcomes of specified medical interventions on specific patient populations. Assessing the relevance of the retrieved references, therefore, is rather a formal task. Little interpretation is needed, for example, to decide whether a certain study really deals with SSRI drug treatment for major depression among patients aged between 18 and 65. By contrast, in reviews of interview research, the relevance assessment involves a good deal of interpretation and therefore can be the step that is most susceptible to biases in decisions about inclusion or exclusion of information into the final summary. To deserve the label systematic, reviews in empirical bioethics have to follow the guiding principles of transparency and systematisation within the crucial steps of the relevance assessment. Based on our experience with SRs, we point out the following three strategies: (1) The relevance assessment has to be informed by a predefined list of inclusion and exclusion criteria. For example, inclusion criteria for references in our review were: (a) providing qualitative data through in-depth interviews, focus groups or surveys with open-ended questions; (b) being conducted in a developed or high-income country; (c) including practising physicians (general practitioners and specialists) as participants and (d) focusing on questions of rationing or resource allocation in healthcare but not allocation of organs or intensive care unit beds. (2) In assessing the relevance of the retrieved references, the reviewing authors should be blinded as thoroughly as possible. They should make their judgement based only on title and abstract. Potentially biasing information such as the authors’ names, the journal or the year of publication has to be eliminated from the list. (3) At least two experts in the field of inquiry should score the relevance of each reference in relation to the predefined inclusion criteria, using a classification scheme such as the following: (a) irrelevant: very poor in relation to the inclusion criteria; (b) slightly irrelevant: poor in relation to the inclusion criteria; (c) somewhat relevant: good in relation to the inclusion criteria, and (d) relevant: very good in relation to the inclusion criteria. Scores from the two experts should be compared to assess agreement and to evaluate the inter-rater reliability by intraclass correlation coefficients such as Cohen’s κ or Cronbach’s α.19 In cases of discrepancy, a third expert should be consulted to determine the final relevance value for each reference. To ensure transparency, all decision-making points within the relevance assessment have to be explicitly documented. Common instruments to present the different steps of the relevance assessment are flow charts (figure 1 shows the flow chart of our SR of qualitative studies).

Figure 1 Flow chart for relevance assessment.

In our review of qualitative interview studies of HCR, a total of 1.7% of the references (10 of 575) were considered relevant to the SR question. Only MEDLINE (8 references), EMBASE (5), CINAHL (2) and the ancillary search strategies (1) provided relevant references. MEDLINE retrieved the highest number of relevant references, while, in contrast to EMBASE, only CINAHL and the ancillary search strategies both retrieved one additional reference that was not found by MEDLINE. EUROETHICS and PsychInfo did not retrieve any of the studies that were finally included in the review. Even though EUROETHICS might provide important conceptual bioethics literature, it was not helpful in our study to retrieve empirical studies in bioethics.

Quality assessment of the included studies

In traditional SRs, the quality assessment more or less replaces the relevance assessment as a method of influencing the principal source of bias. The quality of the study often determines the inclusion or exclusion in the final outcome presentation or meta-analysis. Several more-or-less rigorous check lists such as the CONSORT, Jadad, or GRADE criteria exist to assess the quality of clinical trials.2022

By contrast, the systematic appraisal of the quality of interview research is characterised by great controversy rather than by a gold standard. For a more detailed analysis of the various pros and cons of using checklists in qualitative and quantitative interview research, see Walsh and Downe,23 Barbour24 and Giacomini.25 We suggest that if specific tools for quality assessment within SRs of interview research are applied, reviewers should explicitly justify their objective and provide a detailed description of the instrument.26 27 For instance, in our review of qualitative studies, quality assessment was performed to inform the reader about the characteristics of the included studies. We assessed the quality of the included studies based on a modification of the appraisal tool for qualitative research studies (CASP) developed by the Public Health Resource Unit (PHRU, See also table 3 (additional online resource). A similar modification of the CASP tool has proved to be effective in previous SRs on qualitative studies.28 29

Data analysis and data presentation

Traditional SRs are mainly concerned with combining outcome data from interventional studies. They involve techniques, such as meta-analyses, that are concerned with assembling and pooling data, and require a basic comparability between the phenomena studied so that the data can be aggregated for analysis.

In contrast, SRs of interview research in empirical bioethics aim to summarise the qualitative concepts and quantitative data identified in the primary studies without providing a final score that might decide about the effectiveness or ineffectiveness of a specific intervention. Interview studies are not interventional studies. In past years several articles raised questions about how to summarise or synthesise qualitative research findings.30 31 Further on, the approaches might differ in the way they analyse the data reported by the studies included in the review. Dixon-Woods and colleagues give an overview of possible methods such as narrative summary, thematic analysis, grounded theory, meta-ethnography, content analysis, qualitative comparative analysis and others.32 The methods vary in their ability to deal with qualitative and quantitative forms of evidence and in the type of question for which they are most suitable. A more comprehensive description and discussion of these methods is beyond the scope of this paper. However, the simple fact that these various approaches exist underlines the need for clear reporting of the methods chosen for analysing and summarising study findings in empirical bioethics.

For example, in our SR of qualitative interview research, we used the technique of thematic analysis to extract the qualitative data from the included studies. Though the main emphasis of the research question was somewhat different among those studies, all of them presented narrative accounts of physicians, relating to the topic of HCR. The final result of the thematic analysis of these narrative accounts is a summary table that presents the wide range of themes and concepts that play a crucial role in bedside rationing according to the interviewed physicians. Like all qualitative research findings, the summary table is concerned with the generalisability of the range of themes and key issues emerging in the narrative accounts that were presented by the reviewed studies. Yet, it will not provide any statistical or other kind of quantitative generalisability. To assess the relative importance of each single key issue captured by such a SR, one needs an additional SR of quantitative interview research. The objective of such a SR of quantitative interview research is twofold. On the one hand, it provides the data for assessing the relative importance of key issues in a certain field of bioethics. On the other hand, we often face the situation that, at present, not all key issues have already been studied for their relative importance. Data analysis within a SR of quantitative interview research, therefore, also aims to highlight the need for further research projects.


Due to the considerable methodological complexity and diversity in empirical bioethics—such as qualitative and quantitative interview research—there is a great need for the development of systematic and transparent techniques to synthesise and interpret their results. If suitably adapted to the peculiarities of the field, SRs of empirical bioethics provide a pertinent instrument that addresses this need by optimising transparency and systematisation in summarising empirical findings with impact on ethical reasoning and theory. The impact itself can take two different forms. Empirical findings can have a modificatory impact, if they give rise to important changes or transformations of ethical theory or reasoning. They can have a supportive impact, if they corroborate an ethical theory by analysing the factual practice to which the theory shall apply. The results of the SR of HCR, for instance, had a supportive impact on the normative ethical frameworks developed for HCR, especially for the requirements of consent, minimising conflicts of interest, and publicity.33 34 In contrast, the results had only a low modificatory impact for the frameworks’ norms.

In this article, we present a methodology for SRs that explicitly takes into account the specific features of interview studies in empirical bioethics. We illustrated the application of this methodology by presenting the results of our SR of interview research about HCR. The major recommendations for SRs in empirical bioethics are as follows:

  • As the PICO-model of traditional SRs does not fit the specific aspects of interview research in bioethics, we developed the MIP model, which defines methodologies, issues and participants as the most important factors for developing a sound review question. For experimental, outcome-oriented studies, which to date are rare in empirical bioethics, the PICO model could also apply.

  • To achieve higher specificity and sensitivity in searching empirical studies relevant to bioethics, search terms have to be adapted to the specific keyword catalogues and indexing vocabulary in each bibliographic database. To become acquainted with the underlying mapping patterns of the database, we recommend looking for the database-related vocabulary used in indexing those relevant articles that were retrieved by prior non-systematic reviews or that are already known (index mapping).

  • To avoid different sources of bias during the relevance assessment, the reviewers that receive a list with the retrieved references should be blinded to the authors’ names, the journal and the year of publication. In addition, at least two experts should score the relevance of each reference in relation to predefined inclusion criteria and classification schemes.

  • Intra-class correlation coefficients should inform about the inter-rater reliability.

  • Because there is no gold standard to assess the quality of interview studies, reviewers should explicitly justify their objective by using any of the existing instruments and should provide a detailed description of the instrument.

  • Within the data analysis, the reasons for the use of a certain method for summarising qualitative or quantitative interview findings should be explicitly stated.

These recommendations can help to increase the systematic character and transparency of reviews in the field of empirical ethics and therefore decrease the influence of various sorts of biases, such as bias in the literature search or in relevance assessment. Even though the methodology for SRs in empirical bioethics presented in this article was able to demonstrate its usefulness during our SR of interview research on HCR, the approach should not be considered definitive. Further experience with methodological variations and different issues could be helpful to further improve the interplay between empirical data and argument-based ethical reasoning. Finally, even if the risk of biases in the retrieval, interpretation, summary and communication of information from empirical studies in bioethics can be reduced by systematic reviews, we must not neglect the influence of other sorts of biases, such as sociocultural circumstances and subjective value judgements in ethical decision-making.


We would like to thank Marion Danis and Jon Tilburt for their critical review of the manuscript.


View Abstract


  • Funding: This work was supported by grant 01GP0608 from the German Federal Ministry of Education and Research and by a grant from the German Academic Exchange Service

  • Competing interests: None declared.

  • The views expressed by the authors do not necessarily reflect policies of the US National Institutes of Health or the US Department of Health and Human Services.

Other content recommended for you