An AI toolkit

The purpose of the toolkit is to serve as a communciation tool between the developers and clinicians. It will produce a single A4-size PDF containing the information we think a clinican must know to be able to safely use any given AI algorithm on any individual they see in their clinic.

Key Stages in the Toolkit Process:

Population Description:
- The toolkit uses predefined population data sourced from trusted external databases (e.g., the Office for National Statistics, ONS), which provides detailed demographic and behavioural data. This data is not inputted by users but is part of the context in which the AI’s performance is evaluated.
- The description includes factors such as demographic composition of the population, key health data (e.g., smoking or obesity rates, life expectancy), relevant to the population under study.
Pathology Description:
- In this stage, the toolkit focuses on the pathology or medical condition for which the AI model is being used. This includes understanding how different groups (e.g., gender, age) are affected by the condition.
- Example: A disease might be more common in men, but more deadly in women. This stage ensures the context of the pathology is well-understood in relation to the AI’s application.
Evaluation and Statistical Analysis:
- This stage involves analysing the evaluation results that clinicians or developers input into the toolkit. These results typically include patient characteristics (e.g., age, sex, socioeconomic status) and the corresponding AI results.
- The toolkit performs statistical tests to check whether there is any correlation between patient demographics and the AI’s predictions. The key goal is to demonstrate that demographic factors do not affect the AI’s performance. If no correlation is found, the toolkit provides assurance that the AI model is operating equitably across different groups.

What the Toolkit Achieves:

Assurance of Fairness: The toolkit’s main objective is to communicate whether there is sufficient evidence that the AI system is not biased towards a particular group. It does this by statistically analysing the relationship between demographic factors (e.g., age, gender, socioeconomic status) and the AI results. If no correlation is found, clinicians can be confident that the AI is not exhibiting biased behaviour based on these factors. If there is insufficient evidence to test for a correlation, clinicians know to take extra care with specific patient groups.
Evidence-Based Confidence: Instead of flagging potential biases, the toolkit provides assurance that where no correlation exists, there is no evidence of bias in the AI’s predictions.
Limitations: The toolkit does not address biases that may arise due to flaws in how the AI was trained or built (e.g., biased training data or model design). It is focused on clinically relevant data and population factors, not on algorithmic issues like poor model training. The success of the toolkit depends on data provided, so if insufficient testing has been performed it cannot give recommendations.

Our toolkit does not address or fix algorithmic bias within the AI model itself (such as issues with model design or training data). It instead helps to demonstrate how the AI has been tested, what the results were, and how they can be interpreted. We hope such a toolkit will clearly indicate when an AI tool can safely be used in routine clinical practice, and when additional clinical judgement is needed.

An AI toolkit

AI Fairness NHS Working Group