Using machine learning techniques to explore multimorbidity progression in patients with organic mental disorders

Lead Supervisor
Dr Zina Ibrahim
Lecturer in Computer Science for Health Informatics
Department of Biostatistics and Health Informatics, King’s College London
zina.ibrahim@kcl.ac.uk

Co-supervisor
Dr Rebecca Bendayan
Department of Biostatistics and Health Informatics, King’s College London

Project Details

Background: 

Recent estimations have shown that by 2035 around 67% of UK citizens over 65 years old will have multimorbidity (2 or more chronic health conditions) from which around half will include cognitive difficulties (Kingston et al., 2018). Cognitive difficulties are common in individuals diagnoses with organic diseases (e.g., dementia) and they are known predictors for disability. There is a need to identify determinants for increased risk of disability, and one of these risk factors are other co-existent health conditions. Therefore, this project aims to identify patterns of multimorbidity progression in order to identify patients at greater risk of disability. For this, we will use electronic health records and develop machine learning models to cluster individuals by their patterns of multimorbidity progression and examine their association with trajectories of disability over time. 

Novelty and Importance:

From a clinical point of view, this project would allow to identify individuals at higher risk of complex multimorbidity (and consequently, disability and mortality) in SLaM, which could allow to develop targeted interventions earlier in time. From a methodological perspective, tools developed would be directly relevant and applicable to electronic health records collected by CRIS SLaM and other CRIS resources.

Primary aim(s):

To identify the most common patterns of multimorbidity progression in individuals with organic disease and their association with disability trajectories. This will provide an overview of which physical health conditions are more likely to appear in the first 5 and 10 years after diagnoses and whether these are associated with common risk factors such as age, sex, medication, health behaviour and social and environmental factors at time of diagnoses. 

Specific objectives:

  1. To develop and validate data extraction tools using natural language processing techniques.
  2. To explore which physical health conditions are more likely to develop at 5 and 10 years since diagnoses.
  3. To identify characteristics of individuals at higher risk of developing complex multimorbidity patterns and higher levels of disability over time.

Planned research methods and training provided: BHI training and KCL early career training opportunities. Specific training on NLP and predictive statistical modelling (including machine learning techniques).

Objectives / project plan:

Year 1 (Objective 1): Literature review. CRIS data retrieval and preparation (including and NLP techniques). 

Year 2 (Objective 2/3): Data analysis to identify the most common patterns of multimorbidity progression and associations with potential risk factors at time of diagnoses. Paper submissions. 

Year 3 (Objective 3): Data analysis to identify individuals at higher risk of disability. Thesis and BRC report on potential clinical implications write up.

Scientific themes : 1) Learning from Big Data for Health. This project uses large, distributed, heterogeneous data sources such as CRIS EHRs to address major public health challenges such as dementia.

Datasets

CRIS system enables access to anonymised electronic patient records for secondary analysis from SLaM and has full ethical approvals. CRIS was developed with extensive involvement from service users and adheres to strict governance frameworks managed by service users. It has passed a robust ethics approval process acutely attentive to the use of patient data. Specifically, this system was approved as a dataset for secondary data analysis on this basis by Oxfordshire Research Ethics Committee C (08/H06060/71). The data is de-identified and used in a data-secure format and all patients have the choice to opt-out of their anonymized data being used. The CRIS Oversight Committee is responsible for ensuring all research applications comply with ethical and legal guidelines. 

Keywords

Machine Learning; Multimorbidity; Electronic Health Records