Predicting pregnancy has been a fundamental problem in women's health for more than 50 years. Previous datasets have been collected via carefully curated medical studies, but the recent growth of women's health tracking mobile apps offers potential for reaching a much broader population. However, the feasibility of predicting pregnancy from mobile health tracking data is unclear. Here we develop four models -- a logistic regression model, and 3 LSTM models -- to predict a woman's probability of becoming pregnant using data from a women's health tracking app, Clue by BioWink GmbH. Evaluating our models on a dataset of 79 million logs from 65,276 women with ground truth pregnancy test data, we show that our predicted pregnancy probabilities meaningfully stratify women: women in the top 10% of predicted probabilities have a 89% chance of becoming pregnant over 6 menstrual cycles, as compared to a 27% chance for women in the bottom 10%. We develop a technique for extracting interpretable time trends from our deep learning models, and show these trends are consistent with previous fertility research. Our findings illustrate the potential that women's health tracking data offers for predicting pregnancy on a broader population; we conclude by discussing the steps needed to fulfill this potential.
翻译:50多年来,预测怀孕是妇女健康的一个根本问题。以前的数据集是通过仔细整理的医疗研究收集的,但最近妇女健康追踪移动应用程序的增长为覆盖更广泛的人口提供了可能性。然而,从移动健康跟踪数据预测怀孕的可行性尚不清楚。我们开发了四个模型 -- -- 后勤回归模型和3个LSTM模型 -- -- 以利用妇女健康跟踪应用程序BioWink GmbH的Clue数据预测妇女怀孕的可能性。我们评估了65 276个有地面真实怀孕测试数据的妇女7 900万日志的数据集模型。我们显示,我们预测的怀孕概率可能明显地限制妇女:在预测的概率最高的10%的妇女有89%的怀孕机会超过6个月经期,而最底层妇女有27%的怀孕机会。我们开发了一种技术,从我们深层次学习模型中提取可解释的时间趋势,并显示这些趋势与以前的生育力研究相一致。我们的调查结果表明,妇女的健康跟踪数据有可能通过更广泛的步骤来预测怀孕的可能性。我们通过讨论孕期的可能性。