Article Text

Download PDFPDF

Testimonial injustice in medical machine learning
Free
  1. Giorgia Pozzi
  1. Technology, Policy and Management, Delft University of Technology, Delft, The Netherlands
  1. Correspondence to Giorgia Pozzi, Delft University of Technology, Delft 2628 BX, The Netherlands; g.pozzi{at}tudelft.nl

Abstract

Machine learning (ML) systems play an increasingly relevant role in medicine and healthcare. As their applications move ever closer to patient care and cure in clinical settings, ethical concerns about the responsibility of their use come to the fore. I analyse an aspect of responsible ML use that bears not only an ethical but also a significant epistemic dimension. I focus on ML systems’ role in mediating patient–physician relations. I thereby consider how ML systems may silence patients’ voices and relativise the credibility of their opinions, which undermines their overall credibility status without valid moral and epistemic justification. More specifically, I argue that withholding credibility due to how ML systems operate can be particularly harmful to patients and, apart from adverse outcomes, qualifies as a form of testimonial injustice. I make my case for testimonial injustice in medical ML by considering ML systems currently used in the USA to predict patients’ risk of misusing opioids (automated Prediction Drug Monitoring Programmes, PDMPs for short). I argue that the locus of testimonial injustice in ML-mediated medical encounters is found in the fact that these systems are treated as markers of trustworthiness on which patients’ credibility is assessed. I further show how ML-based PDMPs exacerbate and further propagate social inequalities at the expense of vulnerable social groups.

  • Ethics- Medical

Data availability statement

No data are available. Not applicable.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Machine learning (ML) systems are increasingly being introduced in high-stakes fields such as medicine and healthcare. On the one hand, it has been shown that these systems hold great potential in ameliorating medical delivery, reaching high levels of accuracy and precision.1 On the other hand, it is also widely acknowledged that they are the source of salient ethical questions regarding, for example, patients’ autonomy, responsibility allocation and trust.2–4 This paper focuses on a less discussed but not less relevant epistemo-ethical issue: the role of ML systems in medicine in causing testimonial injustice, that is, a form of epistemic injustice.5 6

To make my case for testimonial injustice arising in connection with the use of ML in medical contexts, I consider ML systems deployed in the USA to predict patients’ likelihood of opioid addiction or misuse, that is, automated Prediction Drug Monitoring Programmes (PDMPs).7 8 I show that these systems‘ role in medical decision-making could deflate patients‘ credibility on epistemically invalid grounds, reducing the overall epistemic relevance of their testimonies and thus harming them in morally significant ways. Thus, I aim to show that patients are wronged as epistemic subjects, crucially due to how these systems mediate patient-physician interactions.

This paper is structured as follows. In the next section (2), I point out the problematic nature of ML-based PDMP risk scores and briefly introduce the problem of testimonial injustice in medicine. In section 3, I address the question of how PDMPs deflate patients’ credibility, and I argue that the main reason can be traced to the fact that PDMPs are treated as markers of trustworthiness. That is to say, I show that the risk scores generated by these systems are treated as crucial indicators on which assessments regarding patient credibility are formed. In section 4, I focus on the epistemic and ethical concerns that arise from treating these systems as markers of trustworthiness, and I argue that this practice is both morally and epistemically unjustified.

Testimonial injustice in medicine

‘I don’t think you are aware of how high some risk scores are in your chart’.7 With these words, a woman named Kathryn was discharged from a hospital in July 2020 while still in a precarious health condition. It turns out that an algorithmic system (NarxCare algorithms7) that is supposed to deliver an accurate estimation of her likelihood of opioid misuse was decisive for this to happen. In fact, these ‘law enforcement-developed digital surveillance systems’8 (p. 51) play an increasingly relevant role in physicians’ decision-making.1

In the aforementioned case, the risk score assigned to Kathryn led her physician to discharge her and the pharmacies to deny service to her. What made her situation even worse was that she could not overturn that unfavourable outcome with her knowledge of her own physical and mental condition. She knew that she was not addicted to opioids and had never misused drugs. However, the authority taken up by the automated system generating her risk score overrode the legitimacy of her testimony and constrained her possibility of contradicting an inaccurate assessment of her drug consumption.

Even if PDMP scores are ideally supposed to be used by physicians as a starting point to engage in a conversation with their patients, PDMPs in their current deployment de facto hinder rather than facilitate fruitful exchanges. Their black-box nature and the fact that legal actions can be taken on physicians labelled as overprescribers are factors that lead healthcare professionals to over-rely on these systems.8 ,2 The bottom line is that Kathryn was excluded from a decisionmaking process in which, strikingly, she was intended as the sole beneficiary.

This briefly depicted scenario clearly illustrates that the injustice suffered by Kathryn is palpable. However, going beyond the deep sense of injustice that our moral intuitions can capture, how can the nature of the wrong she experienced be conceptualised? It is noticeable that Kathryn was not only denied access to her fair share of medical assistance but was also undermined in her role as a knower. This comes to light considering that she was wrongfully disadvantaged in her possibility to communicate relevant information about her health condition and having this information recognised and acted on by medical professionals. I argue that the framework of epistemic injustice, particularly Fricker’s conceptualisation of testimonial injustice, can capture the moral and epistemic wrong that Kathryn suffers from. In fact, the injustice she experienced cannot be understood exclusively in distributive terms. As Fricker points out5 and Symons and Alvarado extensively elaborate,6 epistemic injustice is not necessarily connected with the unfair distribution of goods (eg, access to information or, in the context of medicine and healthcare, medical professionals’ advice, and medical support and care). Epistemic injustice is to be considered a discriminatory injustice in which a person’s epistemic status is unjustifiably diminished for epistemically invalid reasons (on which I elaborate below). Thus, the framework of epistemic injustice points out a more subtle form of harm that can easily go unnoticed. More precisely, it sheds light on the mechanisms underlying epistemically illegitimate reductions in a person’s credibility. Even if these instances can often be connected to other inequalities, they deserve to be considered in their own right.6 In this contribution, I consider the role played by ML-based PDMPs in causing this genuinely epistemic form of moral wrong. Before turning to this analysis, I reconstruct in more detail the main features of testimonial injustice.

Fricker’s analysis of testimonial injustice relies on the observation of discriminatory practices that question the epistemic status of individuals belonging to disadvantaged categories based on unfounded prejudices. That is to say, an individual suffers testimonial injustice if she receives, as a consequence of prejudices that her interlocutor holds related to her social identity, less credibility than she would have received in the case that prejudicial judgements were not in place.5

A sadly common case of testimonial injustice is when a woman is attributed less credibility because of her gender. For example, in pain medicine, it has been shown that women’s pain is often underestimated, a phenomenon not encountered at the same frequency by men.9 This occurrence of unfair treatment is due to an inappropriate withdrawal of credibility, often rooted in stereotypes that deflate women’s credibility levels. For instance, women are often perceived as more emotional and apt to complain than men and are granted, for these reasons, less credibility overall.10–12 Also, racial biases can be the root of illegitimate credibility deficits, leading to unjust pain management. A study by Trawalter et al shows that black patients are often undertreated due to misguided beliefs regarding their ability to endure pain.13 The consequences of an unjustified lack of credibility can have a detrimental impact on patients’ general well-being, above and beyond the fact that they are wronged in their capacity as knowing subjects: the information they seek to convey is not taken seriously and does not get to inform the decision-making processes they are directly affected by. As Fricker points out, certain stereotypes and prejudices related to one’s social identity are so entrenched in our social structures that they are not easily detected, let alone amended.5

The PDMP case previously described indicates that ML systems implemented in medicine and healthcare can create further imbalances in physicians’ assessments of their patients’ credibility. In the next section, I analyse how this happens and for what reasons, identifying the main features of testimonial injustice when it is induced by ML-based PDMPs.

Credibility in ML-mediated medical decision-making

A reciprocal relationship of trust between patients and their physicians is grounded in respect for epistemic duties and rights, among other aspects. From a patient’s perspective, the latter amounts to the right to receive relevant information regarding one’s health condition (eg, the results from tests the patient underwent in understandable terms and free from complex technicalities), to convey knowledge about one’s mental and physical state, and to have information shared with physicians taken into account, among others. Respectively, patients’ epistemic rights translate into epistemic duties for physicians. The latter are categorised by Watson as basic epistemic duties and comprehend the duties to ‘seek, receive and impart information,’14 (p. 36) along with their negative counterparts (eg, avoiding seeking irrelevant information about a patient that exceeds the purpose of a diagnostic procedure). A successful patient–physician relationship holds as long as patients can trust physicians to respect their epistemic rights and physicians can trust their patients with the duty to be sincere when, for instance, patients report their symptoms.

Most situations in which we trust someone are situations of vulnerability for the trustor. In such situations, we defer to the trustee (ie, the person we (need to) trust), confident that they will fulfil the expectations that are implicitly or explicitly intrinsic to the trust relationship itself.15 ,3 While patients need to trust medical professionals’ expertise and beneficence, physicians also have to trust that the information patients provide about themselves (eg, their symptoms, medicament used) is not deceptive.16 Therefore, it can be said that trust is closely related to credibility. In assessing a patient’s testimony, the unjustified withdrawal of credibility can be disadvantageous, potentially leading to injustice.4

Credibility assessments are particularly prone to be distorted by biases and stereotypes connected to a person’s social identity because to form these, we usually rely on so-called markers of trustworthiness.5 In fact, when deciding whether to rely on someone’s testimony, we need to find a way to assess their epistemic trustworthiness. After all, we want to accept testimonial exchanges that are, most probably, truth-conducive so that relying on a person’s testimony will lead a subject to form true beliefs about the world.17 However, identifying markers of trustworthiness that can successfully fulfil this epistemic task also has a considerable moral dimension. Testimonial injustice is often in place if what leads to the acceptance of certain markers of trustworthiness are prejudicial assessments connected to one’s interlocutor’s social identity (ie, age, gender, ethnicity, social status).5

In medical encounters, markers of trustworthiness play a more or less important role depending on the situation. Typically, two different scenarios can be distinguished. On the one hand, on occasions in which a patient’s reported symptoms are objectively connected to a visible and easily quantifiable cause—say, a patient reports pain and an X-ray shows a broken bone—credibility attribution happens in a quite straightforward manner, and the need to recognise markers of trustworthiness to assess the credibility of a patient’s testimony moves into the background. On the other hand, particularly epistemically and morally salient are cases in which patients’ reports of their symptoms are the main or only way in which physicians can have epistemic access to their medical condition, while lacking quantifiable and objectively recognisable physiological manifestations that could explain a patient’s complaints. This is often the case in chronic pain patients, patients who suffer from psychosomatic diseases and more generally, in the clinical assessment of pain.18 In the latter cases, individuating suitable markers of trustworthiness is crucial to formulating appropriate credibility judgements that inform medical decisions, for instance, regarding whether to prescribe opioid medication.

Patients in need of opioid medications often belong to the categories mentioned. Moreover, the possible stigma of drug addiction or misuse adds a further layer of complexity and inclination toward prejudicial judgement, which can easily further deflate the credibility of a patient’s testimony. Given these difficulties, the flair of objectivity and neutrality often (mistakenly) attributed to ML systems could seem like a viable solution. This is precisely the idea behind the implementation of systems such as NarxCare algorithms to manage the opioid epidemic pervading the USA. However, as widely agreed on in the AI ethics literature, ML systems are never value-neutral (see, eg, Mittelstadt et al 19), and their perceived objectivity can be highly misleading. I argue that automated PDMP systems exacerbate and reinforce—rather than mitigate—the occurrence of testimonial injustice in the particular case of interest. The reason is that they are crucially considered markers of trustworthiness on which credibility assessments are often formed. If patients are not granted credibility and their testimonies are discredited because PDMPs are treated as markers of trustworthiness, then a form of testimonial injustice is induced by ML systems.

To substantiate the claim that PDMPs are treated as markers of trustworthiness, briefly consider how they operate. As I have extensively discussed elsewhere,20 these ML systems are not only epistemically opaque in the technical sense of the term: the unwillingness of the company owning NarxCare algorithms to reveal information regarding the weight and nature of the proxies that inform patients’ risk scores makes a critical assessment of the results produced pretty much impossible.8

Despite this fact, automated PDMPs considerably influence medical decision making. An empirical study by Leichtling et al shows that ‘(i)n response to worrisome PDMP profiles with new patients, participants [.ie, clinicians using PDMPs] reported declining to prescribe, except in the case of acute, verifiable, conditions.’21 (p. 1063) This means that information produced by these systems about patients can easily override the epistemic value of their testimonies. Against this background, PDMP scores could, on occasion, be treated as if they were able to say everything that needed to be said about a patient’s drug consumption level and eligibility for opioid prescriptions.

Hildebran et al 22 23 show how communication practices between patients and physicians tend to cut off testimonial exchange, creating an atmosphere of distrust that leads to medical decisions that can hardly be challenged by a patient needing medical attention.20 This further supports the claim that credibility assessments often happen by relying heavily on the scores provided by ML-based systems.

In the next section, I analyse why basing credibility assessments on PDMP scores is unjustified from both epistemic and moral viewpoints.

PDMP risk scores as markers of trustworthiness: epistemic and moral concerns

What has been said thus far points to the moral harm that misguided PDMP scores can cause. Treating these systems’ scores as markers of trustworthiness is epistemically and ethically unjustified for multiple reasons. First, it points to an overestimation of what ML systems can achieve, indicating automation bias.24 However, it is essential to recognise these systems’ limitations. ML systems are statistical systems that make predictions based on correlations established at the population level. The latter are generated by connecting people who share certain salient attributes and are, as such, categorised as being part of the same reference class (eg, turning to a certain number of physicians for medical care, having experienced certain traumatising events, criminal history8) with the target class of interest (in this particular case, people who are likely to develop opioid addiction or misuse).25 Nevertheless, it is not an epistemically legitimate step to switch from the population level to the individual level without further consideration and without taking into account the particular situations and values of patients in their singularity.

This means that these predictions can be highly misleading, connecting attributes pertinent to a certain category of patients but not necessarily connected to a person’s drug consumption or possible tendency to misuse opioids.20 Moreover, using, for example, criminal history as a proxy that informs patients’ final risk score disadvantages racial minorities, who tend to be underinsured and overpoliced in the US compared with white people.8 Consequently, these systems tend to misclassify already disadvantaged social groups, playing a crucial role in further exacerbating their inability to counteract the testimonial injustice connected to wrongful credibility assessments made considering their risk scores.

A second epistemically and ethically relevant issue that the framework of testimonial injustice successfully captures is that these systems can result in patients being silenced and deprived of a major epistemic right: to actively convey information. Strikingly, these systems displace physicians from their authoritative position,7 8 and at the same time, they lead to a shift back to a more paternalistic approach in medicine, from a patient’s perspective. Therefore, physicians are less empowered because, as previously pointed out, their medical decisions can be considerably influenced by ML outcomes, and patients are less involved in decision-making since the credibility of their testimonies is strongly deflated by the risk scores assigned to them. Thus, they end up being merely recipients of medical decisions that are in alignment with their risk scores. The bottom line is that these systems constrain epistemic participation, possibly undermining the widely accepted principle of shared decision making in medicine.26 ,6

Furthermore, when PDMPs are treated as markers of trustworthiness, the testimonial injustice they perpetrate cannot be considered interpersonal anymore (ie, happening between two or more interlocutors) but rather assumes a structural character: it is not in the hand of a single physician to look beyond a risk score generated by the system, as a physician is limited in their epistemic and moral agency by these systems.20 Expecting physicians to make medical decisions on mostly unchangeable and flawed risk scores thus becomes an institutionalised procedure. It follows that the injustices perpetrated go beyond the episodic instances of testimonial injustice that can occur in human–human interactions. As Anderson points out, ‘(t)estimonial exclusion becomes structural when institutions are set up to exclude people without anyone having to decide to do so.’27 (p. 166) As a direct consequence, a ‘contestability vacuum’ emerges: the impossibility of achieving recourse in the occurrence of an inaccurate risk score and the scale of propagation of harm caused by these systems is what makes ML-induced testimonial injustice particularly harmful.6 This is not to say that human credibility assessments are always flawless. They can be, of course, just as biased. However, in a non-ML-mediated scenario, the biased decision of a single doctor with prejudices that is not prone to provide a patient with pain medication for epistemically invalid reasons (eg, a patient’s gender) can be, at least in principle, spotted and amended by non-biased physicians who are involved in a medical decision-making process. In contrast, ML systems, such as PDMPs, can systematise inequality so that contestability escapes the possibilities of single individuals. The shift from an interpersonal to a structural dimension thus bears a significant moral component.

Final remarks

In this paper, I considered how the use of ML systems, such as automated PDMPs, as markers of trustworthiness that inform physicians’ judgements of patients’ credibility, brings about instances of testimonial injustice in ML-mediated medical practices. These considerations advanced the question of whether using ML systems to identify patients at risk of drug abuse is epistemically and morally legitimate.

I maintain that this is not the case because striking the right balance in credibility assessments is paramount in this kind of practice. Using these systems can shift the balance in patients’ disfavour, particularly for those belonging to already disadvantaged social categories. If I was successful in showing the epistemic and ethical limitations of these systems in terms of testimonial injustice, it is clear that trying to ameliorate the opioid crisis by outsourcing delicate decisions to ML systems fails its purpose.

While this paper shows the limitations of systems such as automated PDMPs, it does not provide possible solutions. However, having a clearer grasp of the problem is a step needed to move forward in striving for the development and deployment of ML systems that consider the fundamental principle of justice, not only in its ethical but also in its epistemic significance.

Data availability statement

No data are available. Not applicable.

Ethics statements

Patient consent for publication

Acknowledgments

I am very grateful to Juan M. Durán and Jeroen van den Hoven for their feedback on a previous version of this paper and for the many fruitful discussions on the topic of this work. I would also like to thank the two anonymous reviewers of the Journal of Medical Ethics for their very helpful comments.

References

Footnotes

  • Contributors GP is the sole author and guarantor.

  • Funding This work was supported by the European Commission through the H2020-INFRAIA-2018-2020/H2020-INFRAIA-2019-1 European project “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (Grant Agreement 871042). The funders had no role in developing the research and writing the manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • The extent to which automated PDMPs influence medical decision-making varies from state to state, according to different provisions (eg, 13 states mandate healthcare workers to consult PDMP records, and in other states, physicians check PDMPs only in cases that are deemed suspicious21 28). Even if these systems might not be the overall predominant method of practice (yet), empirical studies have shown that concern that these systems are leading to a worrisome shift in medical practice is legitimate.29 As Oliva points out, ‘due to state PDMP use mandates and law enforcement surveillance, clinicians (…) increasingly rely on PDMP risk scores to diagnose and treat patients. And there is little doubt that such clinical reliance will become even more pervasive’8 (p. 109). I thank an anonymous reviewer for encouraging me to clarify the extent to which automated PDMPs are currently used in medical practice.

  • As I elaborate in more detail in section 4, one can argue that the objectivity and neutrality wrongfully attributed to these systems bear the potential to deflate the value of patients’ testimonies in medical decisions. Haines et al point out that healthcare providers are ‘more likely to accepting the default settings of automated systems at the expense of other relevant emotional and psychological patient information.’30 They continue stating that this ‘can result in errors of commission, where the value of information attributed to the automated tool overrides the value of clinical expertise, even where the automated information contradicts clinical training and evidence’ (p. 2). This corroborates the claim that even if automated PDMPs are intended as decision support systems, they considerably influence physicians’ judgements in crucial decision-making practices so that physicians often end up following these systems’ recommendations (particularly in combination with other features of medical practice, such as time limitations). Under this heading, the value of patients’ testimony in shared decision- making is likely to be obfuscated by the scores attributed to them. I thank an anonymous reviewer for encouraging me to clarify this important point.

  • In a patient–physician relationship, a physician’s epistemic duties are rendered explicit by institutionalised practices (eg, the Hippocratic Oath), codes of conduct, and the four fundamental biomedical principles (ie, beneficence, non-maleficence, justice and non-discrimination).

  • Let me clarify that avoiding epistemic injustice does not require physicians to take patients’ testimonies at face value. If a physician has valid reasons to deem a patient epistemically untrustworthy, she is, of course, entitled to disregard her testimony (without infringing her epistemic rights). Crucially, testimonial injustice occurs if the reasons a patient’s testimony is disregarded are epistemically invalid, such as in a case of unfounded prejudice related to a patient’s social identity. I thank an anonymous reviewer for encouraging me to clarify this relevant point.

  • For example, Fricker refers to the characteristic of being a gentleman in seventeen-century England as a marker of trustworthiness. Conversely, the lack of this characteristic in women, for example, was taken to be the opposite, that is a marker of untrustworthiness. This is an example of a non-epistemically grounded marker of (un)trustworthiness5 (p. 119).

  • There is research showing that in some cases, patients were not even informed that automated systems were involved in the decision-making and medication denial was based on their risk scores.7 8 This can be seen as contrary to practices of shared decision making in medicine. More needs to be said about how ML-based systems impact shared decision-making. However, due to the limited scope of this paper, I cannot pursue this issue further.

Linked Articles