Understanding Patient Heterogeneity through Machine Learning: A Study of Clozapine Adverse Drug Reactions

Lead Supervisor
Dr Zina Ibrahim
Department of Biostatistics and Health Informatics, King’s College London

Professor Richard Dobson
Professor of Medical Bioinformatics
Department of Biostatistics and Health Informatics, King’s College London

Project Details

The project proposes the exploration of various unsupervised and supervised machine-learning models to uncover subphenotypes embedded within heterogeneous patient populations, using data mined from electronic health records and biomedical repositories. The aim of the project is to derive tested pipelines for quantifying disease heterogeneity by identifying subphenotypes with differing responses to treatment. 

The current data deluge in medicine, as a result of the digitisation of patient care, presents a massive opportunity to uncover algorithmic insight and high-quality indicators of diseases heterogeneity. However, the highly dimensional, noisy and irregular nature of available data has rendered the task of uncovering true patterns embedded within a patient’s treatment timeline a computationally challenging task. This project is built around the idea of developing a suite of pattern recognition algorithms to model disease subphenotypes and identify the features responsible for the heterogeneity within patient subpopulations, and use it to understand the variations in drug response. The models to be developed in this project will focus on two issues currently lacking in machine learning models used in medicine: a) the delivery of robust models that are resistant to noise, and b) the ability to interpret the observed differences within the populations attributed to each subphenotype identified. 

The project will build on the supervisors expertise in knowledge representation, machine learning and medical informatics, and existing work on uncovering subtypes of diseases through knowledge modelling and machine learning.The project also utilise existing resources, namely the Clinical Record Interactive Search (CRIS) and the electronic health records of Kings College Hospital (KCH).

Project Aims: 

  1. Develop and evaluate a suite of pattern recognition algorithms to identify heterogeneous subpopulations of patients having distinct phenotypic and  genetic and characteristics.
  2. Study the variations among the uncovered subpopulations in terms of response to treatment and adverse drug reactions. 
  3. Build and evaluate a prototype for a decision support tool to predict clozapine-induced adversities.  

Novelty and Importance:

The models and results generated by the project will contribute to the state of the art Machine Learning models and new development in the field of personalised medicine. The project will also provide a much-needed insight on the administration and potential adversities of clozapine, which is the gold standard for the management of treatment-resistant schizophrenia, yet is underutilized due concerns over its side effects, some of which are potentially fatal and require frequent monitoring.


This project will utilise data from the South London and Maudsley NHS Trust, as well as King’s College hospital through existing governance frameworks via research passports.  


machine learning, mental health, adverse drug reactions, clozapine