Article Text

Download PDFPDF
Addressing bias in artificial intelligence for public health surveillance
  1. Lidia Flores1,
  2. Seungjun Kim1,
  3. Sean D Young1,2
  1. 1 Department of Informatics, University of California Irvine, Irvine, California, USA
  2. 2 Department of Emergency Medicine, School of Medicine, University of California, Irvine, Irvine, CA, USA
  1. Correspondence to Sean D Young, Department of Emergency Medicine, University of California Irvine, Irvine, USA; syoung5{at}


Components of artificial intelligence (AI) for analysing social big data, such as natural language processing (NLP) algorithms, have improved the timeliness and robustness of health data. NLP techniques have been implemented to analyse large volumes of text from social media platforms to gain insights on disease symptoms, understand barriers to care and predict disease outbreaks. However, AI-based decisions may contain biases that could misrepresent populations, skew results or lead to errors. Bias, within the scope of this paper, is described as the difference between the predictive values and true values within the modelling of an algorithm. Bias within algorithms may lead to inaccurate healthcare outcomes and exacerbate health disparities when results derived from these biased algorithms are applied to health interventions. Researchers who implement these algorithms must consider when and how bias may arise. This paper explores algorithmic biases as a result of data collection, labelling and modelling of NLP algorithms. Researchers have a role in ensuring that efforts towards combating bias are enforced, especially when drawing health conclusions derived from social media posts that are linguistically diverse. Through the implementation of open collaboration, auditing processes and the development of guidelines, researchers may be able to reduce bias and improve NLP algorithms that improve health surveillance.

  • ethics- medical
  • ethics- research
  • ethics
  • decision making
  • information technology

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available.

View Full Text


  • Contributors SDY and LF contributed to the design and the formulation of the main arguments. LF, SK and SDY were responsible for the drafting and editing of the paper. LF, SK and SDY contributed to the final editing and approved the final version of the manuscript. LF is the guarantor and takes full responsibilty for all aspects of the manuscript.

  • Funding This study was funded by National Institutes of Allergy and Infectious Diseases (NIAID, grant number: 5R01AI132030-05), National Center for Complementary and Integrative Health (NCCIH), National Institute on Minority Health and Health Disparaties (NIMHD), and National Institute on Drug Abuse (NIDA).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Other content recommended for you