Using Topological-Machine Learning Data Analysis for feature extraction and outcomes prediction in data from smartphones and wearable devices

Lead Supervisor
Dr Raquel Iniesta
Senior Lecturer in Statistical Learning
Biostatistics and Health Informatics Department. Institute of Psychiatry, Psychology and Neuroscience (IoPPN)
King’s College London

Dr Nicholas Cummins

Project Details

The rich set of sensors in smartphones and wearable devices provides the possibility to passively collect streams of data in the wild. The raw data streams, however, can rarely be directly used in the modeling pipeline. The processing is often focused on extracting features by summarising information from few data channels, e.g., processing location data only, phone usage traces only, or location and phone usage only. The extracted features have also been far from a comprehensive set, not exploiting the totality of time-point information available and often without providing reasonable details of the extraction process. Topological Data Analysis (TDA) based on persistence homology and the mapper algorithm is a novel approach to deduce intrinsic properties from the “shape” of the data. This PhD project will aim to investigate a topological framework that can process raw data streams and extract useful features based on an optimal selection of time-points resolution over different temporal slices. The resulting framework is intended to optimise the extraction of data from smartphones and wearable devices to enhance the performance of posterior unsupervised and supervised Machine Learning tasks, such as anomaly detection and outcomes prediction in patients with multiple sclerosis, depression, psychosis and epilepsy.


This project will involve two different data sources.

(1) RADAR-CNS: one of the largest remote disease-monitoring studies in Europe of around 1500 individuals with brain disorders on multiple sclerosis, epilepsy, and depression collecting and processing data from mobile devices to identify biomarkers that predict relapse or deterioration (eg, changes in sleep, physical activity, cognition, memory).
(2) CrossCheck: a multimodal data collection system designed to aid in continuous remote monitoring and identification of subjective and objective indicators of psychotic relapse.

Data access is granted and the ethical approval has been obtained.


Topological Data Analysis; Machine Learning; Feature extraction; Prediction; Smartphones; Wearable devices