Clinical decision support in management of thoracic malignancy through multi-omic data science

Lead Supervisor
Dr Sophia Tsoka
Reader in Bioinformatics
Faculty of Natural, Mathematical & Engineering Sciences
King’s College London
sophia.tsoka@kcl.ac.uk

Co-supervisor
Professor Vicky Goh, Professor Gary Cook

Project Details

1. Problem statement 

Earlier diagnosis, better prognostication and individualised treatment planning represent  acute challenges for cancer where there remains a clinical need to improve patient-related  outcomes. This is particularly true for thoracic-based cancers such as lung, mesothelioma  and oesophageal cancer that are associated with a poor prognosis despite multimodality  ‘curative’ treatment (≤10% 10-year survival rate). To improve clinical decision support, it is  clear that no single data modality will suffice, so research into multi-omic data integration is  timely and requires multi-disciplinary contribution. In this project, we plan a multi-pronged  approach in aiding subtype discovery and patient risk-stratification. We propose various  avenues for analysis of multi-source biomedical data and inference. Tasks are envisioned  that will work in unsupervised and supervised mode, complementing each other and will  contribute towards a system for clinical decision support and prognostic application.  

2. Proposed Workplan 

Multiscale data from multimodal diagnostic devices in combination with big data and  Artificial Intelligence analytics is likely to not only improve early detection, but also to  provide new signatures for accurate selection of at-risk subgroups and suitable candidates  for personalised prevention strategies and early treatments in cancer. Machine learning  approaches provide powerful means to not only extract more meaningful features and  create better predictive models but also to integrate multiple data sources. This project will  deliver an integrated platform for analysis of imaging and omics data to determine the value  of combining complementary information, such as radiomics and genomics, to boost  predictive performance and aid personalised intervention in thoracic malignancies.  Task 1: Familiarisation with data, development of data management and representation  The task of data integration and management in the context of clinical applications is  particularly challenging due to complex data of heterogeneous nature, arising from diverse  sources of measurement and requiring different types of processing. Before meaningful  analysis through machine learning, appropriate storage and management of data in a single  resource will facilitate understanding of data and exploratory analyses.  Task 2: Data integration for multi-omic analysis 

Using multiple types of radiomic, genomic and clinicopathological data, distinct molecular  subtypes of thoracic cancers will be identified by combining data from different layers such  as CNV, mutation, DNA methylation, transcriptomics (mRNA expression and microRNA  [miRNA] expression) and imaging. This integrative analysis will produce a comprehensive 

catalogue of genetic and epigenetic drivers of patient subtypes. Multivariate methods will  be explored for simultaneous integration of data based on matrix factorisation principles  and transformation-based methods to derive multi- omics biomarkers that are predictive of  disease, characterise disease phenotypes and reflect molecular patterns that span across  biological domains.  

Task 3: Predictive Models and Causal Inference 

This task will build on biomarkers generated in task 2 and will employ Bayesian Network  (BN) inference and machine learning to model conditional statistical dependencies among  radiomic, molecular and genomic features. In addition to BN inference, other machine  learning methodologies such as decision tree ensemble methods (gradient boosted trees,  random forest) will offer interpretable and more scalable approaches for feature reduction  and prioritisation, thereby enhancing and complementing inference through BNs.