June 2025
Introduction
This case study explores the use of artificial intelligence (AI) in women’s health, both in formal health settings like the hospital and general practice, and informal settings like a person’s home or mobile applications they might use.
This study focuses specifically on fairness concerns in pregnancy and maternal health care. AI tools have been shown work less well in minority populations (1). We will look at risks in period-tracking and fertility applications, where AI integration is starting to become more popular.
If AI in healthcare is poorly designed or not checked thoroughly, it can make existing health disparities worse and put people’s privacy, choice and ability to access fair care at risk.
Background
AI is now used in many areas of women’s health — both in formal settings like hospitals, and in personal tools like apps.
In maternity care, machine learning models help predict risks like preeclampsia and serious complications, aiming to support faster and better clinical decisions (2,3). Key players include the NHS, researchers, tech companies and patients.
Meanwhile, period and fertility tracking apps have evolved from simple calendars to tools that use AI to spot patterns and give predictions. These apps are often made by private companies and used by over 100 million people globally.
While they can help people understand their bodies, concerns have been raised about data privacy, unclear rules and how the AI actually works. In some parts of the US, following the overturning of Roe v. Wade—which removed the federal right to abortion—there have been cases where menstrual data has been requested in legal investigations, increasing the potential risks for users (4).
Despite the growing use of AI tools in maternity care, significant disparities in outcomes remain. In the UK, Black women are still nearly four times more likely to die during childbirth than white women. This raises important questions about whether these technologies are tackling the most critical problems or potentially diverting attention from the structural changes needed to address them (5).
This raises two big questions:
- Is AI really helping with the most important problems in maternity care?
- Are health apps safe, fair and inclusive as they become more reliant on AI?
Biases and harms
AI in women’s health creates risks in both clinical and consumer settings, though in different ways. These risks often affect groups already facing disadvantages—especially those impacted by health, racial and gender inequalities.
Clinical Systems: Risk of making things worse
AI tools are increasingly being used to support decision-making in maternity care. In the UK, the C2-Ai Maternity and Neonatal Observatory, launched in 2024, making retrospective and real-time analysis of maternity data across 47 clinical risk factors to help NHS providers benchmark outcomes, allocate resources and address health inequalities (6).
Similarly, PeriWatch Vigilance®, an AI-driven early warning system, monitors maternal and foetal vital signs during labour to alert clinicians to potential distress (7). Researchers have also developed machine learning models to predict severe maternal morbidity and preeclampsia using patient records and biomarkers (2,3).
These systems aim to reduce preventable harm and improve outcomes. However, even well-intentioned tools can unintentionally replicate existing health disparities if built on biased or incomplete data. A landmark U.S. study found that a risk prediction algorithm significantly underestimated the needs of Black patients because it used healthcare costs as a proxy for health need, a metric that embeds systemic inequality (8). While maternity AI tools may not follow this approach, the risk is transferable: if models are trained on skewed datasets or underrepresent certain populations, they may fail to detect risk in those groups. In the UK, where Black women are nearly four times more likely to die during childbirth than white women (5), such blind spots could contribute to delayed interventions and unequal care outcomes. Clinical AI may offer the appearance of objectivity while encoding long-standing systemic inequities.
Consumer Technologies: Documented Oversteps and Emerging Risks
In the consumer space, several harms have already come to light. The period tracking app Flo was found to have shared intimate reproductive health data with Facebook and Google, despite promising users it would remain private (9). The incident prompted regulatory action by the U.S. Federal Trade Commission and highlighted the gap between user expectations and actual data practices. Similarly, in Argentina, a Microsoft-backed AI system used socioeconomic and demographic data to predict teenage pregnancy risk, raising alarm over consent, profiling and digital discrimination (10).
In 2023, reports emerged that UK police had requested data from menstrual tracking apps and tested women for abortion-inducing drugs after unexplained pregnancy losses (11). In such contexts, reproductive data, can become a legal vulnerability. This creates profound implications for autonomy, trust and safety.
Analysis and Contributing Factors
-
Biased Data and Embedded Assumptions
AI systems in women’s health are often built on datasets that reflect limited population diversity and normative assumptions about bodies and care. In clinical contexts, AI tools are commonly trained on retrospective hospital data to assess patient risk. These datasets frequently overrepresent white, urban populations and underrepresent groups who face systemic barriers to healthcare access, such as Black and minority ethnic women (13). While these systems are intended to improve safety and support resource allocation, they may unintentionally reproduce patterns of underdiagnosis or undertreatment present in historical data.
Menstrual and fertility tracking apps commonly use models based on a 28-day cycle with mid-cycle ovulation (12). While this may suit some users, it risks misrepresenting or excluding those with irregular cycles or other reproductive health variations. These apps rarely disclose how their algorithms are trained or for whom they are optimised. Critics have also noted that such tools often reflect normative assumptions about gender and reproduction, which can influence how users understand and relate to their bodies (4).
AI systems that are built on incomplete or homogenous data, shaped by implicit assumptions about what is “normal”, risk marginalising those already underserved by health and data infrastructures.
-
Lack of Context-Specific Validation
AI tools are not always evaluated for fairness or accuracy across diverse demographic groups or different health system contexts. Clinical models that perform well in one country or institution may fail to translate when applied elsewhere due to differences in population demographics, clinical workflows, or systemic inequalities (14). For example, an AI model designed and validated in an urban setting may not adequately capture the needs of pregnant patients in a rural area, where care pathways, patient profiles and outcome disparities differ significantly.
-
Gaps in Regulation and Oversight
Period and fertility tracking apps fall outside traditional medical device regulation as they are marketed as wellness products rather than clinical tools. This regulatory grey zone has allowed companies to introduce predictive features (e.g., ovulation forecasts, fertility scoring, or symptom analysis) without the scrutiny applied to regulated medical technologies. As a result, these systems may process sensitive reproductive data with limited transparency, unclear accountability and inconsistent privacy protections. National regulatory bodies are beginning to address these concerns, but progress remains uneven (15).
MHRA regulate AI as a medical device. How an AI tool is regulated depends more on how a product is categorised than on its societal impact.
-
Structural Inequities in Healthcare
The deployment of AI systems into already unequal healthcare environments can amplify harm. In both the UK and globally, Black and minority ethnic women experience significantly poorer maternal outcomes due to a complex interplay of social, structural and institutional factors (5). When AI tools are designed and implemented without explicitly addressing these disparities, they risk reinforcing existing inequities—automating bias rather than correcting for it.
Mitigation and Recommendations
-
Improve Data Diversity and Transparency
AI systems must be trained on data that reflects the full diversity of the populations they are intended to serve. This means ensuring representation across ethnicity, geography and socioeconomic status. Developers should also be transparent about who is included in training data, and who may be excluded.
-
Embed Equity Testing and Subgroup Validation
AI models should undergo subgroup performance testing to identify disparities in predictive accuracy across key groups (e.g. ethnicity, sex, age, socioeconomic status, comorbidities). This requires fairness auditing to become a standard part of the validation process.
-
Apply Robust Regulatory Oversight
Existing frameworks like the NICE Evidence Standards Framework should be consistently applied to clinical AI tools, with particular attention to equity and transparency. For consumer health apps, including those using AI for menstrual or fertility tracking, regulatory agencies have clarified when products fall under medical device regulation—typically when they make diagnostic or predictive health claims. Enforcement of existing standards should be strengthened, and clearer communication is needed to help users understand which tools are regulated. Privacy protections should be consistently applied to sensitive reproductive data.
-
Involve Users and Patients in Design
AI tools for women’s health should be developed with meaningful input from users. This includes women and communities historically marginalised by healthcare systems. Participatory design can surface needs and risks that developers may overlook.
-
Address Underlying Structural Inequities
AI systems must be designed and deployed with an understanding of the structural inequalities that shape health outcomes. This includes recognising the historical and ongoing impact of racism, sexism and social exclusion in clinical decision-making. Tools built without this awareness risk replicating the very harms they are meant to mitigate.
Glossary of Terms
Algorithmic Logic: The decision-making process used by an AI system to analyse input data and produce an output or prediction. In many AI tools, especially those using complex models like machine learning, this logic can be difficult to interpret or explain, leading to concerns about transparency and accountability.
Training Data: The dataset used to "teach" an AI system how to recognise patterns and make predictions. If this data is not representative of all populations, the system may perform poorly or unfairly.
Predictive Model: A type of AI system that uses data to forecast outcomes or risks, such as the likelihood of developing a medical condition.
Healthcare Disparities: Differences in health outcomes or access to care that are linked to social, economic, or environmental disadvantages, often affecting minoritised or underserved populations.
Preeclampsia: A potentially dangerous pregnancy complication characterised by high blood pressure and signs of organ damage, usually occurring after 20 weeks of gestation.
Severe Maternal Morbidity: Unexpected complications during labour or delivery that can have serious, long-term health consequences for the mother.
Menstrual/Fertility Tracking Apps: Consumer-facing digital tools, often using AI, that help users monitor their menstrual cycles, predict ovulation, or track fertility symptoms.
Fairness Auditing: The process of testing an AI system to evaluate whether it performs equitably across different population subgroups (e.g. by race, gender, or health status).
Wellness Product: A category of consumer product that promotes health or wellbeing but is not regulated as a medical device, often used to describe health apps that avoid clinical claims.
References
- Rickman S. Evaluating gender bias in large language models in long-term care. BMC Med Inform Decis Mak. 2025 Aug 11;25(1):274.
- Layton AT. Artificial Intelligence and Machine Learning in Preeclampsia. Arterioscler Thromb Vasc Biol. 2025 Feb;45(2):165–71.
- Rodríguez EA, Estrada FE, Torres WC, Santos JCM. Early Prediction of Severe Maternal Morbidity Using Machine Learning Techniques. In: Advances in Artificial Intelligence - IBERAMIA 2016: 15th Ibero-American Conference on AI, San José, Costa Rica, November 23-25, 2016, Proceedings. Berlin, Heidelberg: Springer-Verlag; 2016 [cited 2025 June 14]. p. 259–70.
- Hammond E, Burdon M. Intimate harms and menstrual cycle tracking apps. Comput Law Secur Rev. 2024 Nov 1;55:106038.
- MBRRACE-UK. Saving Lives, Improving Mothers’ Care 2023 - Lessons learned to inform maternity care from the UK and Ireland Confidential Enquiries into Maternal Deaths and Morbidity 2019-21. Oxford: NPEU, University of Oxford; 2023
- Buildiling Better Healthcare. Maternity ‘observatory’ system launched to help NHS measure and enhance safety. 2024
- PeriWatch Vigilance - Perinatal Early Warning System. PeriGen.
- Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 Oct 25;366(6464):447–53.
- AIAAIC - Flo menstrual cycle data sharing.
- AIAAIC - Microsoft Plataforma Tecnológica de Intervención Social.
- Tortoise Media. British police testing women for abortion drugs.
- Worsfold L, Marriott L, Johnson S, Harper JC. Period tracker applications: What menstrual cycle information are they giving women? Womens Health. 2021 Oct 9;17:17455065211049905.
- Chinta SV, Wang Z, Palikhe A, Zhang X, Kashif A, Smith MA, et al. AI-driven healthcare: Fairness in AI healthcare: A survey. PLOS Digit Health. 2025 May 20;4(5):e0000864.
- Riley RD, Ensor J, Snell KIE, Archer L, Whittle R, Dhiman P, et al. Importance of sample size on the quality and utility of AI-based prediction models for healthcare. Lancet Digit Health. 2025 June 2;0(0).
- Female health app developers: a reminder to prioritise privacy.
- NICE. Evidence standards framework (ESF) for digital health technologies.