Data science methodologies for discovering drugs and targets in amyotrophic lateral sclerosis
Lead Supervisor
Dr Sophia Tsoka
Reader in Bioinformatics
Faculty of Natural, Mathematical & Engineering Sciences
King’s College London
sophia.tsoka@kcl.ac.uk
Co-supervisor
Prof Khuloud Al-Jamal, Dr Jemeen Sreedharan
Project Details
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative condition with adverse prognosis and limited therapeutic options at present. Despite research efforts, aetiology remains elusive and drug development efforts are confounded by the lack of accurate monitoring markers [1]. Disease heterogeneity, late-stage recruitment into pharmaceutical trials, and inclusion of phenotypically admixed patient cohorts are some of the key barriers to successful clinical trials. Computational methods offer promising avenues to facilitate and improve the evaluation of diagnostic and prognostic markers, as well as to suggest efficient drugs.
This project will focus on exploratory models based on network analysis [2] and predictive methods based on machine learning [3]. The key targets of the workplan will focus on: (i) building protein interaction networks for ALS omic datasets and analysing to derive specific molecular targets, (ii) integrated analysis of omic and clinical data via machine learning to derive prognostic markers (for example see [4]), and (iii) performing drug repurposing tasks to propose suitable ligands for ALS targets with pharmaceutical potential. Below an indication of implementation tasks is outlined.
Task 1: Integration of datasets from multiple biological resources. Public repositories will be used to construct a unified data resource for analysis. This resource will encompass multiple data resources such as genomic and interaction databases, metabolic and signalling pathways, functional annotations, as well as tissue profiling through RNA sequencing in ALS. The use of graph database frameworks will be explored as a scalable option that can handle data heterogeneity and integration well.
Task 2: Analysis through detection of composite communities. Previously, we reported the development of combinatorial optimisation method for consensus graph clustering [5], where multilayer networks corresponding to diverse sources of interactions were combined to determine a single representative partition of composite communities. Here, extension of this work is envisaged so as to apply on drug and protein target interactions related to ALS.
Task 3: Link prediction in heterogeneous graphs through machine learning. We will model the above data, as integrated from various databases as a heterogeneous graph, where nodes could be genes, GO annotations and cell phenotypes, and the edges show the relationship among the nodes. With the recent advances in graph neural networks (e.g.,
graph convolutional networks), we are able to predict potential relationships among the nodes (i.e., link prediction) and thereby derive new associations between target genes in ALS and potential drugs
Overall, the project will include significant novelty both in terms of development of computational methodologies encompassing machine learning, combinatorial optimisation and graph data mining, as well as in offering clinical insights for prognosis and drug discovery for ALS.
Keywords: drug discovery, machine learning, protein interactions, network analysis
References
1. V. Grollemund et al. “Machine Learning in Amyotrophic Lateral Sclerosis: Achievements, Pitfalls, and Future Directions”, Front Neurosci. 2019 Feb 28;13:135., doi: 10.3389/fnins.2019.00135.
2. A.G. Thomspson et al, “Network Analysis of the CSF Proteome Characterizes Convergent Pathways of Cellular Dysfunction in ALS”, Front Neurosci. 2021 Mar 17;15:642324. doi: 10.3389/fnins.2021.642324
3. F. Faghri, et al. “Identification and prediction of ALS subgroups using machine learning”, medRxiv 2021.04.02.21254844, doi: https://doi.org/10.1101/2021.04.02.21254844.
4. E. Amiri Souri, A. Chenoweth, A. Cheung, S. N. Karagiannis, S. Tsoka, “Cancer Grade Model: a multi-gene machine learning-based risk classification for improving prognosis in breast cancer”, Br J Cancer. 2021 Aug;125(5):748-758. doi: 10.1038/s41416-021-01455-1.
5. Bennett, L., Kittas, A., Muirhead, G., Papageorgiou, L. G., Tsoka, S. Detection of composite communities in multiplex biological networks. Sci. Rep. 5, 10345, 2015.