Statistical and machine learning approaches to developing predictive tools for identifying risk of preterm birth and poor neonatal outcome

Lead Supervisor
Prof Rachel Tribe
Professor of Maternal and Perinatal Science
Department of Women and Children’s Health; School of Life Course & Population Sciences
King’s College London

Dr Yanzhong Wang, Professor Andrew Shennan

Project Details

This PhD project will be embedded within a large programme of preterm birth research lead by Professor Tribe (Scientist) and Professor Andrew Shennan (Obstetrician) at KCL and Guy’s and St Thomas’ NHS Foundation Trust. The focus of the multidisciplinary group is to understand biological processes leading to preterm birth and to harness this information to develop useful clinical tools for identifying women most at risk in early pregnancy so they can be stratified to appropriate specialist care. We already have an established statistical based algorithm that underpins a smart phone app for identifying risk of preterm birth from ~20 weeks of pregnancy (Quipp app, In this project, we aim to develop this further by incorporating early pregnancy data obtained from multiomics ‘biomarker’ studies. We have a number of data resources already available to us generated from the INSIGHT cohort as well as access to data from the anonymised UK PCN database. We plan to link our deeply phenotyped UK pregnancy metadata to data collected via the eLixir programme to give us access to detailed neonatal and early life information. We are also currently generating data from the PRECISE/PRECISE Dyad study (sub-Saharan Africa pregnancy and child cohorts) relating to spontaneous preterm birth that may be available for use in this PhD project. Dr Wang, co-supervisor, will support the statistical and machine learning aspects (and training) of this PhD project. Alongside we have active interactions with the Biomedical Research Centre Bioinformatics team for our multiomics integration work.
The PhD student progress will be monitored via normal KCL processes and supported through an active programme of activities within the school. They will be part of a large multidisciplinary team and be expected to participate in regular journal clubs and lab meetings. Opportunities for teaching will be provided and presentations of research work at national and international conferences actively encouraged.


Datasets include KCL generated INSIGHT cohort and multiomics data (Ethics approvals in place; data will be pseudo anonymised/anonymised for analysis); UK PCN, eLixir and PRECISE data (request to Data Access Committees required). Some ‘omics data generated on KCL/GSTT samples will be provided via a commercial collaboration.


Omics; data integration; prediction; pregnancy; preterm birth; machine learning