Use of machine learning and clinical phenotyping to identify determinants and predict CVMD risk using data from registries and electronic medical records

KCL Supervisor
Dr Vasa Curcin
Senior Lecturer in Health Informatics
School of Population Health and Environmental Sciences, King’s College London

Clinical Advisor
Dr Mark Ashworth
Reader in Primary Care
King’s College London

Metadvice Supervisors
Professor Andrew Krentz or Professor Richard Barker

Industry Partner

Project Details

I. Background

This studentship is a collaboration between KCL and Metadvice Ltd, an early stage digital health company focused on building high quality AI-driven clinical decision support tools, and will build on the KCL work on clinical decision support, CVMD risk, and computable phenotypes. 

This project combines the AI capabilities being pioneered by Metadvice with the medical informatics expertise of KCL and the clinical experience gained by senior NHS GPs in Lambeth. The result will be insights into the better management of cardiovascular disease (CVD), specifically high lipids, and a working decision support tool that embeds those insights. 

The understanding of the intersection between cardiovascular (CV), renal and metabolic diseases (MD) has expanded. Clinical overlap exists between these diseases and their associated complications, which explains why it is so complex to reduce CV risk. Currently, CV disease is the number one cause of death in the US, and constitutes an enormous burden not only on patients’ overall health and well-being, but on society and healthcare systems as a whole. CV disease is the leading cause of death in people with chronic kidney disease (CKD) and diabetes. Adults with diabetes are two to four times more likely to die from CV disease than adults without diabetes. Because of this, it is important to continue exploring the interconnected CV, renal and metabolic diseases (CVMD), instead of managing each condition in isolation.

This is an area of national priority: the 2019 NHS long-term plan  aims to prevent up to 150,000 heart attacks, strokes and dementia cases over the next 10 years[1]. Better lipid management also requires detection  of familial hypercholesterolemia, a condition  of great interest to pharmaceutical companies, e.g. the accelerated phase 3 clinical trials of inclisiran to treat hypercholesterolemia[2].

Metadvice is the first company to apply neural network (NN) AI to optimising CVD treatment decisions on a personalised basis in UK primary care, working with the practice selected for this project. The NNs are initially trained using ‘synthetic patients’ reflecting standard treatment guidelines but then progressively acquire insights from actual patient data to verify diagnostic decisions and enable highly personalized treatment choices. These are presented using an engaging interface (already available in prototype form) that depicts the through-time patient journey and estimates the probability of future CVD events with and without appropriate treatment. This enables a much more personalised and informed doctor-patient dialogue and therefore higher concordance with therapy and behaviour change.

Electronic Health Records (EHR) are a rich source of longitudinal patient data that can be analyzed to determine the clinical predictors for a specific disease, and risk stratify patients and generate clinical profiles for screening, monitoring, and treatment based on the results.[3] The complexity of CVMDs potentially lends itself well to the use of ML which are able to incorporate a large variety of variables and observations into one predictive framework without the need for preprogrammed rules. There has been increasing interest in the use of ML to predict CVMD outcomes, with the hope that such methods could make use of large, routinely collected datasets and deliver accurate personalised information on prognosis.

Clinical risk scores have been developed previously to identify those at high risk of stroke and other diseases. This proposal will build on this work by assessing whether clinical profiling and statistical and machine learning techniques can meaningfully improve risk prediction.

II. Aims

The key aims of this research project are: 

  • To describe the incidence and prevalence of CVMDs in patients and derive associations with cardiovascular risk factors using LDN, SLSR and SSNAP databases with GSTT hospital data.
  • To generate clinical profiles of the patients based on risk stratification and assessment for future thrombotic events. These shall be published in a public phenotype repository, e.g. HDR UK Human Phenome repository and PhEMA.
  • To use statistical and machine learning techniques to develop risk prediction scores for key CVMDs (e.g. stroke) using electronic medical records
  • To validate the cohort based clinical risk scores and compare its performance to the proposed techniques. 

III. Data Sources

  1. South London Stroke Registry (SLSR): World’s longest running, population-based stroke register with long term follow up. 
  2. Sentinel Stroke National Audit Programme (SSNAP) : National Healthcare Quality Improvement programmee for measurement of quality and organization of stroke care in the NHS. 
  3. Lambeth Data Net (LDN) : Collection of primary care data from all GP practices in London Borough of Lambeth.

Outcomes of this work will be used to iterate and improve the design of AI-driven clinical decision support tools and systems for clinical practice. 

Potential candidates should be IT-fluent, have interests in the human/machine interface and have sufficient awareness of medicine to appreciate the nature of clinical decision-making.