Article Text
Statistics from Altmetric.com
Since their public launch, a little over a year ago, large language models (LLMs) have inspired a flurry of analysis about what their implications might be for medical ethics, and for society more broadly.1 Much of the recent debate has moved beyond categorical evaluations of the permissibility or impermissibility of LLM use in different general contexts (eg, at work or school), to more fine-grained discussions of the criteria that should govern their appropriate use in specific domains or towards certain ends.2 With each passing week, it seems more and more inevitable that LLMs will be a pervasive feature of many, if not most, of our lives. It would not be possible—and would not be desirable—to prohibit them across the board. We need to learn how to live with LLMs; to identify and mitigate the risks they pose to us, to our fellow creatures, and the environment; and to harness and guide their powers to better ends. This will require thoughtful regulation, sustained cooperation across nations, cultures and fields of inquiry; and all of this must be grounded in good ethics.
The power, reliability and range of applications of LLMs are growing. This, in turn, has led to a gradual shift in how they are evaluated by researchers and other stakeholders. While initial efforts focused on assessing the broad capabilities of base models, such as ChatGPT V.3.5, there is now an increasing emphasis on equipping these models with domain-specific expertise. These ‘custom’ LLMs are enhanced with specialised data sets to improve the models’ performance in niche areas, including—just in the medical domain—diagnostics, mental health counselling, patient intake and procedural consent and even conducting clinical-ethical analyses (see below). One approach to building in such specific expertise is ‘retrieval-augmented generation’,3 which allows an LLM to access information stored in a carefully structured custom database. If done well, this approach can partially alleviate concerns about so-called ‘hallucinations’ and potential biases. Another approach involves exposing the LLM to further training on a specialised data set, thereby adapting the model itself to the style, format or content of the data on which it was fine-tuned.
This process of fine-tuning allows for the creation and use of ‘personalised’ LLMs—a growing topic of interest in bioethics. Personalised LLMs are bespoke adaptations of base models that have been selectively fine-tuned on data produced by, describing, or otherwise pertaining to a specific individual. Prototypes of personalised LLMs include DigiDan,4 a version of GPT-3 fine-tuned on the corpus of philosopher Daniel Dennett, and AUTOGEN,5 a GPT-3 model fine-tuned on the academic output of three bioethicists (including two of the present authors). DigiDan is able to produce novel text that is convincingly similar to Dennett’s writing, and AUTOGEN can produce drafts of new academic articles—based on a proposed title and abstract alone—in the style and according to the manner of argumentation of the authors on whose work it was trained.
While ethical concerns around the personalised generation of text using LLMs are now fruitfully being analysed, an area of equal importance that has received less attention is that of the personalised consumption of text. One of the greatest impacts of LLMs on bioethics in the long term may be the ability of these models to synthesise, modify and present information in ways that suit the preferences and abilities or learning styles of the individual.6 For example, paraphrasing and text-to-speech functionality can be harnessed to have journal articles summarised at a comfortable reading level and read aloud to one in an engaging voice—perhaps that of a favourite celebrity.
But these uses of LLMs, both to generate and consume text, can go well or poorly. Outputs may be interesting and valuable—or actively harmful, as with disinformation. When a human ‘collaborates’ with an LLM to produce good or bad outcomes, how shall praise and blame be allocated? Discussion is now growing around a potential credit-blame asymmetry in moral judgements of outcomes facilitated by LLMs.7 This concept, while applying more broadly than to medical ethics, is highly relevant to it. In scenarios where LLMs contribute significantly to positive outcomes in a clinical context, for example, healthcare providers might not receive much credit, especially if their personal input is minimal. Conversely, they might face substantial blame if reliance on an LLM’s recommendation leads to a negative outcome and is considered reckless or negligent.
Although it is too early to tell how moral debates around the use of LLMs will unfold over the long-term horizon, some themes are likely to be central to future discussions. This issue of the JME includes papers that discuss the privacy implications of LLMs, their potential role in patient consent, in clinical decision-making, and in generating knowledge about medical ethics itself.
Informed consent is a central concern of medical ethics and the prospect that LLMs might play a role in facilitating patient consent merits careful analysis. LLMs have the potential to enhance consent processes by tailoring communications to a patient’s background, understanding and cultural context. However, questions arise about their ability to fully grasp and respond to the nuances of human understanding and emotions. This leads to further discussions about the legal and ethical adequacy of LLM-generated recommendations in the complex process of obtaining informed consent in medical settings. In this issue of the JME, Allen et al consider such issues in their paper on ‘delegated procedural consent’.8 They consider several important ethical questions that need to be answered before LLMs should play a role in delegated consent, but conclude that…
…all of these concerns also apply to other uses of medical technology, and indeed apply to the current practice of consent delegation to junior doctors. Thus, they do not provide reason to reject the use of LLMs out of hand.
From their perspective, there is no in-principle reason why we should not use LLMs to enhance procedural consent and it has the potential to benefit patients: for example, a patient could spend longer at home with a custom-built ‘Consent GPT’ app—an up-to-date LLM that has been fine-tuned on their specific procedure—than they could with (only) a rushed junior doctor at the hospital.
Patient privacy is another crucial topic. LLMs’ effectiveness in medical practice or ethics is closely tied to their access to extensive personal and potentially sensitive patient data, raising issues around data security, consent for data usage and risks of data breaches. There is a continuous effort to develop secure protocols and technologies to protect patient information while reaping the benefits of personalised medicine and medical ethics. Blease outlines both the significant benefits that could result from LLMs and online record access, including allowing patients to be more in control of their information and more knowledgeable about their own healthcare. However, she warns that the…
ORA access combined with the intrinsicality of internet use in daily life, surveillance capitalism, the pressures on health systems, and the challenges of readily availing of health services, create the perfect storm for privacy exposures. It is imperative for patients and clinicians to become more aware of these privacy risks. Health systems should also, as a matter of exigency, outline policies that uphold privacy in the use of LLM chatbots to assist clinicians with documentation.9
While members of the public seem aware that there are privacy risks from online access to health information, the ways in which this is likely to rapidly develop with the aid of LLMs, means that a close watch must be kept on potential breaches of privacy. To that end, there is an emerging trend towards locally running LLMs offline,10 a development that could address privacy concerns by allowing data processing and model training to occur in environments controlled by the end-user, rather than on remote servers.
The debate also examines the depth and accuracy of LLMs in medical ethics and medical decision-making. Despite being trained on vast data sets, including medical literature and ethical guidelines, concerns remain about their ability to apply this knowledge to real-world situations. Discussions focus on training methods, potential biases in training data and the models’ capacity to adapt to the evolving field of medical ethics. Experts are exploring ways to continuously update and refine LLMs to align their recommendations with current ethical standards and medical practices.
The pace of advancement in the field of LLMs is remarkably brisk. Crucially, much of the current published assessments are of ChatGPT V.3.5, not V.4. The latter is marked by substantial improvements in performance and reasoning capabilities as well as in standardised test scores. Nonetheless, writing about ChatGPT V.3.5 in this issue of the JME, Chen et al express a note of caution.
It is ethically imperative for medical professionals, bioethicists, and educators to ensure that such AI systems are used judiciously, with human oversight always in the decision-making loop. This concern is especially apparent in situations where the stakes of misjudgment are high.11
The potential that LLMs have for playing a role of some sort in clarifying and enhancing reasoning in medical ethics is significant, but at present has limitations that mean we should be cautious in its application. Balas et al express similar reservations in their paper in this issue.
While LLMs have shown promise in understanding and generating text based on the vast amount of data they are trained on, there is a clear distinction between processing information and understanding the deep, often context-dependent nuances of ethical dilemmas.12
Ethicists might hope that LLMs cannot yet achieve what is broadly considered to be good, insightful or sound reasoning about medical ethics, and currently that seems to be the case. What makes for original, deep or genuinely normative insights about ethics requires more than what LLMs can do at present.
It is difficult to predict all of the likely uses of LLMs and it would perhaps be foolish to speculate too much about the ethical issues that we should be preparing for. Even so, we think it is likely that the debate will move onto other ways in which LLMs can support patients and decision-making. LLMs have a potential role to play, for instance, in substitute decision-making for incapacitated patients—especially those without next-of-kin to serve as proxies. Alternatively, if made a part of advance decision-making, this could involve LLMs making recommendations for patients who cannot currently communicate their wishes but who have previously trained the LLM on extensive personal data about them and authorised their use for this purpose.13
The use of personalised LLMs for psychiatric and emotional support is also likely to generate intense discussion. This includes their potential role in simulating conversations with deceased loved ones for patients suffering from grief or loneliness. The ethical considerations surrounding such applications, particularly regarding emotional dependence and the authenticity of human relationships, will be complex and multifaceted. The JME will continue to consider high-quality papers that analyse the ethical issues generated by future applications of LLMs.
Ethics statements
Patient consent for publication
Ethics approval
Not applicable.
Footnotes
Twitter @hazemzohny
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer-reviewed.