Article Text
Abstract
The objective of explainable artificial intelligence systems designed for clinical decision support (XAI-CDSS) is to enhance physicians’ diagnostic performance, confidence and trust through the implementation of interpretable methods, thus providing for a superior epistemic positioning, a robust foundation for critical reflection and trustworthiness in times of heightened technological dependence. However, recent studies have revealed shortcomings in achieving these goals, questioning the widespread endorsement of XAI by medical professionals, ethicists and policy-makers alike. Based on a surgical use case, this article challenges generalising calls for XAI-CDSS and emphasises the significance of time-sensitive clinical environments which frequently preclude adequate consideration of system explanations. Therefore, XAI-CDSS may not be able to meet expectations of augmenting clinical decision-making in specific circumstances where time is of the essence. This article, by employing a principled ethical balancing methodology, highlights several fallacies associated with XAI deployment in time-sensitive clinical situations and recommends XAI endorsement only where scientific evidence or stakeholder assessments do not contradict such deployment in specific target settings.
- Ethics
- Information Technology
- Policy
- Decision Making
- Ethics- Medical
Data availability statement
Data sharing not applicable as no datasets generated and/or analysed for this study. Not applicable.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Trust through transparency?
Explainable artificial intelligence systems designed for clinical decision support (XAI-CDSS) have been subjected to lively academic discussion, in-depth ethical scrutiny and scientific inquiry.1–10 They have been heralded for their potential to enhance physicians’ epistemic position, facilitate the resolution of disagreements and the detection of model errors and improve AI adoption by safeguarding stakeholders’ trust and confidence in ensuring ‘trustworthiness’.7–10 Until recently, however, scientific evidence evaluating XAI-CDSS remained limited, thus rendering such prospects purely hypothetical and a domain of conceptual and theoretical dispute.3 5 7 11 12
Empirical findings now increasingly indicate that XAI-CDSS influence the quality of human decision-making, the level of decision confidence and the formation of trust in system output to varying degrees.1 2 For instance, Chanda et al reported that AI support improved physicians’ diagnostic accuracy compared with no AI support, thus positively influencing human decision-making performance.1 However, the introduction of XAI did not improve physicians’ decision accuracy beyond the use of AI support alone.1 While physicians’ diagnostic confidence significantly increased with XAI support, their trust in the support system’s diagnoses depended on the overlap of their clinical reasoning with the one of the XAI, which differed significantly between diagnoses.1 Similarly, Laxar et al found no relevant difference in weight-on-advice between the AI-CDSS that explained its recommendations and the one that did not, reporting overall low median levels of trust in the software employed and the model’s advice given.2
Although it remains challenging to draw definitive conclusions from these findings,1 the available evidence suggests that XAI-CDSS have the potential to fulfil their promised potentials, but only to a limited extent and under specific circumstances.1 2 While several institutions have recommended XAI methods to be tailored to the respective clinical contexts,8 9 there are still generalising views that call for interpretable models without demanding the analysis of specific clinical environments or circumstances before or during model deployment.13
In this article, we assert that transparency of medical AI models might not prove beneficial in certain clinical situations. In particular, the objective of this article is to examine the ethical challenges associated with XAI deployment in specific time-sensitive environments. While Kawamleh and Kiseleva et al have previously demonstrated that the extent of transparency required by law remains a fiercely debated topic,11 12 this article expands the prior art by reflecting on several well-established ethical principles in analysing the most prevalent implications arising from the use of XAI methods in such clinical contexts.14 In consideration of the principles of beneficence, non-maleficence and transparency, this work examines potential barriers to effective, AI-based clinical decision-making in situations where time is of the essence.
Roadmap
In the first part of this article, we clarify important terms used throughout the analysis. We then briefly examine clinical decision-making from a psychological perspective and present a hypothetical example of a surgical XAI-CDSS which illustrates several challenges of XAI decision-support in specific time-sensitive clinical environments. Based on this section’s arguments, we proceed to elaborate on the exacerbating impact of time-sensitive circumstances and the challenges posed by domain shift and explanation errors. In conclusion, we examine the normative implications and present a comprehensive set of policy recommendations.
Clarifying terms
The term ‘Explainable Artificial Intelligence’, or simply ‘XAI’, has seen a notable number of definitions. Following the notions of several computer science experts, we hereby clarify that throughout this article, XAI shall refer to AI techniques comprising methods that both increase transparency of algorithmic reasoning and allow for human interpretation of the system’s inner workings or generated output.1 Accordingly, we defer more fine-grained distinctions to those already working on the semantics of related terms.15 16 Therefore, we use the terms ‘explainability’, ‘interpretability’ and ‘transparency’ interchangeably. Specifically, we acknowledge that among all established methods of achieving model transparency, especially two techniques remain of key importance, that is, inherent ante hoc and post hoc explainable methods.1 9 16–18 In the following examples, we stick to well-established post hoc explainability methods such as heatmap technology,3 17 19 contending that more advanced techniques gradually make their way into clinical practice.
Additionally, the concept of ‘domain shift’ is associated with the difficulty that deep learning algorithms face in generalising to unseen or out-of-domain data distributions.20 This concept describes the problem of certain AI models inaccurately predicting a certain outcome due to diverging distributions within the input data that differ from those of the training or validation data.1 20 For example, ‘domain shift’ can occur when a deep learning model processes patient characteristics that differ significantly from those on which the model was previously trained and validated.1 20 We also use the terms ‘time sensitivity’ and ‘time pressure’ interchangeably, for both convey a similar meaning of temporal scarcity or lack of time.
Lastly, although the concept of trust remains ‘an important bedrock of society’ aligned with the rule of law,8 it is subject to a broad variety of definitions. For instance, there can be both trust in the patient–physician relationship and trust in the technology one is inclined to rely on.21–23 In this article, the term ‘trust’ shall be understood in accordance with the intentions of the respective authors cited.
Time-sensitive environments affect clinical decision-making
In clinical decision-making, time is a critical factor. In psychological terms, healthcare professionals facing complex medical questions require sufficient time to predict potential outcomes and weigh the importance of those outcomes to make informed decisions.24 Research has demonstrated that time pressure can lead to faster performance rates, but lower performance quality. Such an increase in heuristic information processing favours efficiency over complexity.25 Therefore, a shortage of time may result in patterns of intuitive decision-making, woven into the psychological traits of decision-makers. Although intuitive decision-making is often sustained by previous experience and pattern recognition, it is more prone to cognitive biases due to its heuristic foundation.26 27 Situations of time pressure can negatively influence the quality of the decision reached.28 Therefore, the time-sensitivity factor of a given medical environment can have a tremendous impact on professional judgement and patient safety, in addition to other psychological factors.
Notably, many physicians regularly encounter situations of time pressure in their daily decision-making.29 Shortage of time and reliance on intuition seems to constitute an inherent aspect of the medical profession, often associated with emergency diagnostics, specific surgical interventions or inadequate resource management within the healthcare sector. While several of these factors can be mitigated through organisational measures, for example, by investing additional financial resources, others remain inherently and unchangeably time-critical, such as neuroradiological emergency diagnostics, myocardial infarction therapy or cardiac surgery. As XAI-CDSS promise to provide accurate and transparent decision recommendations that augment human judgement, it is important to assess the ethical implications of XAI-CDSS deployment in inherently time-sensitive interventional circumstances that are faced with heuristic decision-making.
XAI deployment in time-sensitive environments
XAI-CDSS have been tested in various medical specialties, including dermatology,1 ophthalmology,30 cardiac imaging,18 radiation therapy,31 oncology32 and genetics.4 Importantly, the rapid progress of AI development has led to XAI training and validation in intensive care settings2 and surgical departments.33 34 These latter environments share a common element: time pressure. To understand the ethical significance of time-sensitive variables in the context of XAI decision support, we give a hypothetical example of an XAI-CDSS deployed in an inherently time-sensitive environment: cardiac surgery. The underlying clinical challenge is derived from both expert opinion and literature.35 36 The objective of this example is to illustrate the most prevalent ethical implications of time-sensitive settings and to infer normative conclusions. Arguably, some clinical scenarios may present of a less impactful nature than that portrayed in the following example. Despite the inherent variability of situations and the diversity of XAI model architectures, the merits of the following use case still underline the core arguments presented in this paper. Most implications of time-sensitive clinical environments are not merely the exception but rather the norm in clinical practice.
Consider an XAI model used in the cardiac surgery department to intraoperatively predict the size of an annuloplasty ring as well as the optimal suture pattern for implanting a mitral valve prosthesis, based on endoscopic imaging. Using data from two endoscopic optics, the algorithm measures several anatomical structures near the mitral valve, including the chords of the valve, and estimates the correct sizes of the prostheses and the most promising suture pattern in real time. In addition, the system’s attention maps accentuate the pixels important to the model’s output with different colours, visually ‘explaining’ the relationships between the pixel values and thereby indicating their relative importance to the model’s output. This information is displayed on the screens of the surgical environment for transparent surgical decision support.
In general, this model should relieve surgeons from the manual labour of repeatedly adjusting templates to achieve an appropriately sized prosthesis. Assisting the surgeon both in selecting the appropriate size of the prosthesis and in identifying the optimal suture point within the predicted stitching pattern can reduce surgery time by several minutes. This is a significant improvement over conventional procedures performed under high time pressure constraints. Reducing surgery time means reducing patient risk exposure, which is the primary goal of this decision support system.
Now, the explanatory nature of this XAI-CDSS is unlikely to benefit surgical staff in achieving this goal.2 This surgical setting necessitates immediate patient management and does frequently not allow for a profound evaluation of model explanations provided by the model’s heatmap technology. Here, the cardiac surgeons will face extensive time pressure: ‘Time is muscle’, and any delay in management could put tissue at risk. Surgeons and their assistants are unlikely to have the time to integrate and process the AI’s predicted outcome explanations during cardiac surgery, for example, by reflecting on the explanations’ accuracy in light of their clinical experience. In fact, if they took the time to study the AI’s explanations, the benefits of using this AI system would be negated, for the biggest advantage of this model is engrained in its time-saving properties.
Imagine the surgeons indeed considered these model explanations while performing the surgery. Here, doubts may arise regarding the predicted output, requiring additional evaluation time to verify the accuracy of the AI predictions and corresponding explanations. If they took the time to decipher which pixels the transparency-enhancing attention mechanism focused on, they would also need to evaluate whether those foci were congruent with existing personal knowledge and prior experience. If an unexpected focus were to be presented, they would undoubtedly spend even more time evaluating such divergent output, essentially prolonging their decision-making process of choosing a rightly sized prosthesis. If XAI explanations were to lower their confidence level, surgeons would have to manually measure the ring size themselves. Opposing such time-saving benefits on the one hand, the potential benefits of human re-evaluation through profound analysis of XAI explanations would on the other hand necessitate a minimum amount of evaluation time, which many surgeons are arguably devoid of in such or similar circumstances.
In this situation, the XAI-CDSS does not provide any direct benefit, but rather extends the surgical procedure and the time in the operating room needed for its successful completion. In-depth considerations of the model’s explanations would not serve the primary goal of this XAI-CDSS, which is to limit intervention time. In fact, the time-sensitive nature of the surgery limits meaningful interactions with the explanations provided by the model. Its use would not only be ineffective but could also be harmful by prolonging intervention time, contravening the principle of non-maleficence.
In cases where time is of the essence, it may thus be more advantageous for a cardiac surgeon to rely on a high-performing, well-designed black-box model instead. This approach could improve immediate patient management without risking additional time for explanatory verification, benefiting both the patient and the surgeon. In such environments, the primary objective of the surgeons is to prevent patient harm and to perform the intervention in accordance with the standard of care, ensuring adherence to the principle of beneficence. It can be reasonably assumed that both principles of beneficence and non-maleficence would favour opaque but high-performing AI models over more transparent ones that may be more prone to impede surgical performance and patient health in such time-sensitive circumstances.
This example demonstrates that, in specific environments, the provision of transparent decision support may not represent the optimal solution. In situations where time is of the essence, there is a risk that XAI users may be unable to adequately evaluate the explanations provided by the system. Although this example is highly specific, its merits are transferable to other contexts, for there are numerous similar cases, including neuroradiological stroke diagnostics, rapid triage decisions, emergency endoscopic interventions, anesthesiologic rapid sequence inductions, and gynecologic emergency caesarean sections, among others.
Do XAI-CDSS and time-sensitivity influence accuracy, confidence and trust?
As pointed out before, CDSS can enhance or diminish physicians’ clinical decision-making competence, affecting their ability to provide accurate assessments and sound clinical judgments.9 37 Specific model designs, such as interpretable models, pleiotropically influence decision metrics including physicians’ confidence in their own decisions and trust in the systems they use.1 2 33 38 In addition, several studies have reported phenomena such as automation bias and algorithmic aversion,37–39 both shown to decrease physicians’ decision accuracy.37 39 40
Time-sensitive circumstances may further negatively impact these clinical decision metrics. Some evidence already suggests that the use of XAI and time pressure constraints may have a greater impact on decision metrics than opaque black-box models.2 In addition, and as demonstrated in the example above, time pressure constraints can prevent XAI users from adequately considering the systems’ explanatory output. Among others, this could decrease trust in the system, as healthcare professionals may be unable to verify the explanations provided within the allotted time frame. This is contrary to their expectations of understanding and rationalising the model’s predictions.5 41 Moreover, physicians’ and patients’ trust in the output may be unduly influenced due to excessive reliance or unwarranted suspicion, which could have implications for physicians’ professional obligations to perform the respective intervention in a timely and less harmful manner (ensuring beneficence), as well as for patients’ physical integrity by not prolonging diagnostic or intervention time (ensuring non-maleficence). In the absence of sufficient opportunities for rationalisation of the XAI output, we question the ethical benefit of XAI-CDSS use in such environments, irrespective of whether one is inclined to consider model explanations, as one formulation of the principle of transparency, to be valuable for enhancing ‘trustworthiness’ or ensuring similar ethical benefits in less time-critical circumstances.
Time pressure aggravates challenges of domain shift and explanation errors
It is crucial to address both domain shift and explanation errors when using XAI-CDSS, as they are particularly amplified in time-sensitive scenarios. First, the domain shift problem can be exacerbated by time shortage. For example, this problem can occur when an AI model classifies imaging data not previously included in the training population without additional out-of-domain testing.20 Input data falling outside the model’s domain may decrease the accuracy of an XAI-CDSS recommendation, requiring AI users to exert additional cognitive effort to dismiss inaccurate predictions39 or use an alternative approach to collecting input data.42
To illustrate, if a surgical team uses XAI support and encounters complications from bleeding, the laparoscopic data generated in these circumstances may not have been previously included in the training data set. Assuming that the model has not been trained on comparable laparoscopic data, but rather on data acquired using a different optical system, the data points related to the bleeding complications are considered out-of-domain. The surgeon responsible for deciding on the proper treatment may know from personal experience that these complications are an out-of-domain scenario, and that the system’s recommendations should, according to the manufacturer’s instructions of use, not be considered for domain shift reasons. Yet, in more ambiguous cases, the user may refer to the system’s recommendations and explanations to determine the origin of the bleeding, and whether the complications are typical or out-of-domain. However, time constraints may lead the user to forego the decision support altogether. Hence, mitigating the domain shift problem may be more challenging given time constraints.
Second, explanation errors can be particularly problematic for clinical decision-making in time-sensitive environments. Many interpretability methods currently produce error-prone explanatory output with unknown and potentially harmful consequences.17 For instance, heatmaps can occasionally mislead those seeking a correct explanation or information on the predicted output.43 Here, physicians would need to decide on whether the explanations are accurate, erroneous or reveal previously unknown correlation patterns. Certain very time-sensitive situations are likely to prevent detailed scrutiny and verification of specific explanations, rendering it even more difficult to determine the accuracy of a plausible explanation. Although some solutions attempt to address the challenges posed by domain shift and erroneous XAI explanations,1 44 the deployment of XAI-CDSS in specific time-sensitive circumstances should be questioned where significant temporal resources cannot be guaranteed.
XAI deployment is not warranted in case of reasonable doubt
The integration of the principles of beneficence and non-maleficence, and their balancing with the principle of transparency,14 indicate that the deployment of XAI-CDSS might not prove beneficial in specific clinical contexts that are very time-sensitive. Normative conclusions regarding the preference for highly performant and well-designed black-box AI over XAI should be reasonably applied to similar cases, taking into account the respective clinical context of deployment and the model design used.23 Given the lack of quantitative and qualitative studies on the specific topic of XAI-CDSS’ influence on decision metrics in time-sensitive environments, further evidence should be collected regarding the ability of current explainability methods to facilitate proper consideration by users, the effect of these methods on physicians’ accuracy, confidence and trust compared with conventional non-XAI systems, and whether user engagement changes with respect to the specific circumstances and the temporal resources provided.
Ghassemi et al have pointed out that XAI explanations can be particularly useful in model development, knowledge discovery and audit processes, but come with inherent limitations when applied to individual decision-making.17 Should stakeholders determine that an XAI architecture still is the optimal choice for decision support in the clinical setting, they are advised to conduct a more thorough examination of the clinical context to ascertain the extent to which the explanations provided by the system can enhance clinical judgement within the specific field of deployment.23 Moreover, it is recommended that policy-makers who generally advocate for XAI determine whether exclusion of explainability methods is warranted or rather unjustified in the case at hand.
Consequently, it can be concluded that the regulatory demand for explainable AI should be limited to instances where its deployment in specific clinical circumstances does not give rise to reasonable doubt or does not contravene the findings of scientific research and evidence-based medicine.2 45 More generalising calls for XAI, such as those presented in several policy guidelines,10 13 require ethical reconsideration if they do not provide for exclusionary provisions and fail to consider the potential implications of XAI-CDSS in specific circumstances, as several guidelines and recent evidence suggest.8 9 23 45 In the event of reasonable doubts being raised, the deployment of XAI should be subjected to serious questioning.
Conclusion
In this article, we have argued that explainable model designs might not prove beneficial in several clinical circumstances. We highlighted that, in specific time-sensitive environments, XAI-CDSS may not live up to their promised potentials and could rather negatively impact clinical decision-making. This is because time constraints may fundamentally impede adequate consideration of explanations and further exacerbate well-known fallacies of AI model deployment such as domain shift and erroneous explanations. By balancing the ethical principles of beneficence, non-maleficence and transparency, we investigated several challenges associated with XAI deployment in time-sensitive clinical situations. We find that the deployment of XAI should be endorsed solely in instances where stakeholder assessments do not raise reasonable doubts or empirical studies do not contradict such use in specific target settings that are comprised of stark temporal scarcity. In this regard, there is a continued need for further investigation of the impact of XAI-CDSS on clinical decision-making in specific time-sensitive clinical scenarios.
Data availability statement
Data sharing not applicable as no datasets generated and/or analysed for this study. Not applicable.
Ethics statements
Patient consent for publication
Acknowledgments
We would like to thank Birgit Shelton of Heidelberg University, Medical Faculty, for proofreading an earlier version of this article and suggesting improvements in writing style and format. In addition, we are indebted to our colleagues at the Section of Translational Medical Ethics for valuable discussions on an earlier draft of this article.
References
Footnotes
MH and ECW are joint senior authors.
Contributors AW, ECW and MH jointly developed the idea for this article. AW and MH analysed the literature that informed this article. ECW and MH provided important intellectual content to the conception of the article. AW wrote the first draft of the manuscript and adjusted the draft in response to the comments of ECW and MH. AW, ECW and MH jointly finalised the draft. All authors made substantial contributions to the conception, design, drafting and revision of the manuscript and are jointly responsible for the content of this article. ECW is the official guarantor. MH and ECW contributed equally to this paper.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
↵Such findings should be analysed with due care, given the inherent challenges of evaluating XAI methods owing to the lack of standardisation, benchmark datasets and metrics for measuring explanation quality.
↵Arguably, the only ones deriving some immediate benefit from the system’s capability of affording explanations would be the by-standing clinical interns, such as medical students on rotation.
Other content recommended for you
- Family presence during cardiopulmonary resuscitation: who should decide?
- Artificial intelligence, bias and clinical safety
- Patient data for commercial companies? An ethical framework for sharing patients’ data with for-profit companies for research
- What does it mean for a clinical AI to be just: conflicts between local fairness and being fit-for-purpose?
- Prehospital emergency care in a humanitarian environment: an overview of the ethical considerations
- Attitudes towards abortion in graduate and non-graduate entrants to medical school in Ireland
- Primer on an ethics of AI-based decision support systems in the clinic
- Ethnic specific recommendations in clinical practice guidelines: a first exploratory comparison between guidelines from the USA, Canada, the UK, and the Netherlands
- Call for the responsible artificial intelligence in the healthcare
- Ethics of the algorithmic prediction of goal of care preferences: from theory to practice