Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
The authors put forward an interesting response to detractors of black box algorithms. According to the authors, what is of ethical relevance for medical artificial intelligence is not so much their transparency, but rather their reliability as a process capable of producing accurate and trustworthy results. The implications of this view are twofold. First, it is permissible to implement a black box algorithm in clinical settings, provided the algorithm’s epistemic authority is tempered by physician expertise and consideration of patient autonomy. Second, physicians are not expected to possess exhaustive knowledge or understanding of the algorithmic computation by which they verify or augment their medical opinions. The potential of these algorithms to improve diagnostic and procedural accuracy alongside the quality of patient decision-making is undoubtedly a boon to modern medicine, but blind deference to them is neither feasible nor responsible, as several logistical and ethical quagmires noted by the authors still remain inherent in algorithmic software.
I concur with the authors on their central thesis concerning computational reliabilism.1 Properly designed algorithms fed properly vetted informational diets have demonstrated substantial gains in accuracy compared with human counterparts.i I see no prima facie reason why opacity alone ought to impugn their epistemic status. As the authors aptly note, physicians are already held responsible for operating machinery they do not fully understand (ie, MRI scans). Moreover, in cases where such machines malfunction, the supervising physician (or hospital) is held legally culpable. It seems then that an acceptable degree of epistemic opacity is already inherent in the administration of ordinary healthcare. Indeed, some might even make the stronger claim that physicians are somewhat opaque to themselves, insofar as the deeper neurological or heuristic mechanisms by which they come to particular medical opinions remain inaccessible.
While I grant physicians are epistemically and normatively justified in trusting properly supervised algorithms, and that it would be overdemanding to expect physicians to double as programmers or computer engineers, I do not share the authors’ confidence that the responsibility gap has been satisfyingly dispensed with.
The responsibility gap is often cashed out as a simple question of ‘who do we blame?’ with an emphasis on deciding who ought to pay restitution to the victim (ie, the hospital, the physician, the programmers, the parent company of the software, and so on). This is both a moral and a legal question. Even if one satisfies this question though (the authors consider the physician sufficiently accountable for algorithmic hiccups), it seems there is an essential condition of redress which remains impossible, namely the implementation of any systematic preventive measures against future recurrence. In the event of a mistake caused by an algorithm’s blind spot, could any procedure be put in place to ensure that future mistakes of the same sort do not repeat themselves?
The golden standard of reducing medical errors, a systems-based approach, relies on identifying and remedying underlying factors which contribute to or increase the likelihood of a given error recurring. A systems approach is critically dependent on the ability to perform a thorough autopsy of the failure and devise an effective solution. This process threatens to become impossible when the algorithm principally responsible for causing the failure is opaque to postmortem investigation. In the case of black box algorithms, are we left to simply wring our hands and accept that the software is, say, 97% accurate, with an unidentified but statistically certain 3% of patients who will continue to be victims? Many algorithms relying on supervised learning adjust the weights assigned to particular variables in light of mislabelling a given input datum as a way of learning and improving over time. Nonetheless, it is possible this recalibration may not be sufficient to correct the flaw outright or in a timely manner (it may take many more repeats of the same mistake to calibrate accurately, and it may never properly calibrate).
The moral cost of being impotent to prevent future recurrences of algorithmic medical error (even if these errors are very rare) cannot be understated, as it is an integral step of a medical community to admit failure, promise to do better and give victims the peace of mind that their fate will not be needlessly suffered by others.
Contributors BL is the sole author of this commentary.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
↵Diagnostic software designed by Google to perform mammogram cancer screenings produced a 9.4% reduction in false negatives, and a 5.7% reduction in false positives relative to the standard margin of error among US radiologists. See McKinney et al. 2