Clinical decision support in management of thoracic malignancy through multi-omic data science
Dr Sophia Tsoka
Reader in Bioinformatics
Faculty of Natural, Mathematical & Engineering Sciences
King’s College London
Professor Vicky Goh, Professor Gary Cook
1. Problem statement
Earlier diagnosis, better prognostication and individualised treatment planning represent acute challenges for cancer where there remains a clinical need to improve patient-related outcomes. This is particularly true for thoracic-based cancers such as lung, mesothelioma and oesophageal cancer that are associated with a poor prognosis despite multimodality ‘curative’ treatment (≤10% 10-year survival rate). To improve clinical decision support, it is clear that no single data modality will suffice, so research into multi-omic data integration is timely and requires multi-disciplinary contribution. In this project, we plan a multi-pronged approach in aiding subtype discovery and patient risk-stratification. We propose various avenues for analysis of multi-source biomedical data and inference. Tasks are envisioned that will work in unsupervised and supervised mode, complementing each other and will contribute towards a system for clinical decision support and prognostic application.
2. Proposed Workplan
Multiscale data from multimodal diagnostic devices in combination with big data and Artificial Intelligence analytics is likely to not only improve early detection, but also to provide new signatures for accurate selection of at-risk subgroups and suitable candidates for personalised prevention strategies and early treatments in cancer. Machine learning approaches provide powerful means to not only extract more meaningful features and create better predictive models but also to integrate multiple data sources. This project will deliver an integrated platform for analysis of imaging and omics data to determine the value of combining complementary information, such as radiomics and genomics, to boost predictive performance and aid personalised intervention in thoracic malignancies. Task 1: Familiarisation with data, development of data management and representation The task of data integration and management in the context of clinical applications is particularly challenging due to complex data of heterogeneous nature, arising from diverse sources of measurement and requiring different types of processing. Before meaningful analysis through machine learning, appropriate storage and management of data in a single resource will facilitate understanding of data and exploratory analyses. Task 2: Data integration for multi-omic analysis
Using multiple types of radiomic, genomic and clinicopathological data, distinct molecular subtypes of thoracic cancers will be identified by combining data from different layers such as CNV, mutation, DNA methylation, transcriptomics (mRNA expression and microRNA [miRNA] expression) and imaging. This integrative analysis will produce a comprehensive
catalogue of genetic and epigenetic drivers of patient subtypes. Multivariate methods will be explored for simultaneous integration of data based on matrix factorisation principles and transformation-based methods to derive multi- omics biomarkers that are predictive of disease, characterise disease phenotypes and reflect molecular patterns that span across biological domains.
Task 3: Predictive Models and Causal Inference
This task will build on biomarkers generated in task 2 and will employ Bayesian Network (BN) inference and machine learning to model conditional statistical dependencies among radiomic, molecular and genomic features. In addition to BN inference, other machine learning methodologies such as decision tree ensemble methods (gradient boosted trees, random forest) will offer interpretable and more scalable approaches for feature reduction and prioritisation, thereby enhancing and complementing inference through BNs.