28 October 2024
Introduction
This case study examines the ability of AI models to recognise patient race from chest X-ray images.
Background
During the Covid-19 pandemic, several risk factors were identified for increased risk of being admitted to hospital and chance for survival for people who had Covid-19. These risk factors included age, sex assigned at birth, height, weight, BMI, ethnicity and having other conditions such as diabetes. Risk factors also included things like how much money people have, where they live, and what jobs they can get which affect how easy or hard life is for them. These factors are called socioeconomic factors. Socioeconomic factors were identified using measurements like Townsend deprivation index which measures home ownership, employment and overcrowding in an area.
An AI algorithm (or tool) was developed by the University of Oxford, which used the above measures, amongst others, to predict which patients would be at risk of hospitalisation if they caught Covid-19 and advised them to shield at home, lowering their chance of getting the disease. This protected them from the worse outcomes of Covid-19. The algorithm was developed with the goal of assessing the risk of Covid-19 for the whole population. This would allow all at-risk patients to make informed decisions about their health and ease the pressure on the already overstretched health system at the time by having fewer people that were most sick.
Biases and harms
The algorithm incorrectly advised people with gestational diabetes and those with missing health data to shield. This disproportionately affected pregnant people, those who had less access to healthcare, and people who had outdated or missing health information. Though shielding was recommended at the time for those with a weakened immune system, there were several impacts on people advised to shield. A survey and interviews of clinically extremely vulnerable (high risk) people who were asked to shield revealed that all of the patients interviewed felt that they could not have shielded without external support and many had to ask family for help (66%). 35% agreed that shielding was making their physical health worse and 43% reported a negative impact on their mental health. This study indicates that, though a small number of people may have been affected by the algorithm, shielding had a significant effect on the lives of those advised to shield.
Analysis and contributing factors
The data used to train the algorithm was collected from seven national datasets. As NHS Digital is the informational partner to the NHS, they had legal permission to collect the data which was hosted at the University of Oxford. Although the data was anonymous, patients were not given the choice to opt-out of being included in the dataset.
Initial analysis of the algorithm found that it incorrectly classed some patients with gestational diabetes as having diabetes (Type 2) and therefore at a higher risk. These patients were incorrectly advised to shield. In addition, if there was missing data from some patients, default values that had an associated risk which was higher than average were filled-in for the patient. This means that risk scores were likely to be overestimated for people with missing information and therefore identified more of these people as potentially high risk (clinically extremely vulnerable). The patients were all added to a ‘Shielded Patient List’ and though shielding was advisory, the patients could only be removed from the list by request.
Mitigation and recommendations
The algorithm was tested using anonymised data from more than 8 million people’s GP records, hospital records and mortality data from late January 2020 to April 2020. Although the model was being cautious by assuming higher risk when unsure, it assumed the impact of false positives and false negatives was the same for all participants (false positives are incorrectly diagnosing a patient with a disease and false negatives are incorrectly telling a patient they do not have a disease). This applies particularly to the decision to assign higher risk default values to patients with missing data. To avoid this, a different process could have been used for patients with missing data, or these patients could have been excluded from the data the algorithm used to learn.
Additionally, there could have been more transparency and explanation for the people included in the Shielded Patient List, as people included on the list may be distressed at receiving a letter advising them that they are on such a list. This increased transparency would then allow patients to make informed decisions about whether they wanted to shield.
Glossary of terms
Training data
The data used as an input for an AI model so that it can learn patterns in the data.
Clinically extremely vulnerable
People who were at a higher risk of contracting Covid-19, particularly due to comorbidities.
References
Coronavirus (COVID-19) risk assessment. NHS England Digital. 27 March 2024.
COVID-19 Population Risk Assessment. NHS England Digital. 18 December 2023.
Code list used for population risk assessment. NHS England Digital. 18 December 2023.
COVID-19 Population Risk Assessment transparency notice. NHS England Digital. 29 June 2022.