Data science methodologies for discovering drugs and targets in amyotrophic lateral sclerosis

Lead Supervisor
Dr Sophia Tsoka
Reader in Bioinformatics
Faculty of Natural, Mathematical & Engineering Sciences
King’s College London
sophia.tsoka@kcl.ac.uk

Co-supervisor
Prof Khuloud Al-Jamal, Dr Jemeen Sreedharan

Project Details

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative condition with adverse prognosis  and limited therapeutic options at present. Despite research efforts, aetiology remains elusive and drug development efforts are confounded by the lack of accurate monitoring  markers [1]. Disease heterogeneity, late-stage recruitment into pharmaceutical trials, and inclusion of phenotypically admixed patient cohorts are some of the key barriers to  successful clinical trials. Computational methods offer promising avenues to facilitate and  improve the evaluation of diagnostic and prognostic markers, as well as to suggest efficient  drugs. 

This project will focus on exploratory models based on network analysis [2] and predictive  methods based on machine learning [3]. The key targets of the workplan will focus on: (i)  building protein interaction networks for ALS omic datasets and analysing to derive specific  molecular targets, (ii) integrated analysis of omic and clinical data via machine learning to  derive prognostic markers (for example see [4]), and (iii) performing drug repurposing tasks  to propose suitable ligands for ALS targets with pharmaceutical potential. Below an  indication of implementation tasks is outlined.  

Task 1: Integration of datasets from multiple biological resources. Public repositories will  be used to construct a unified data resource for analysis. This resource will encompass  multiple data resources such as genomic and interaction databases, metabolic and signalling  pathways, functional annotations, as well as tissue profiling through RNA sequencing in ALS.  The use of graph database frameworks will be explored as a scalable option that can handle  data heterogeneity and integration well. 

Task 2: Analysis through detection of composite communities. Previously, we reported the  development of combinatorial optimisation method for consensus graph clustering [5],  where multilayer networks corresponding to diverse sources of interactions were combined  to determine a single representative partition of composite communities. Here, extension of  this work is envisaged so as to apply on drug and protein target interactions related to ALS. 

Task 3: Link prediction in heterogeneous graphs through machine learning. We will model  the above data, as integrated from various databases as a heterogeneous graph, where  nodes could be genes, GO annotations and cell phenotypes, and the edges show the  relationship among the nodes. With the recent advances in graph neural networks (e.g., 

graph convolutional networks), we are able to predict potential relationships among the  nodes (i.e., link prediction) and thereby derive new associations between target genes in  ALS and potential drugs 

Overall, the project will include significant novelty both in terms of development of  computational methodologies encompassing machine learning, combinatorial optimisation  and graph data mining, as well as in offering clinical insights for prognosis and drug  discovery for ALS.  

Keywords: drug discovery, machine learning, protein interactions, network analysis

References  

1. V. Grollemund et al. “Machine Learning in Amyotrophic Lateral Sclerosis:  Achievements, Pitfalls, and Future Directions”, Front Neurosci. 2019 Feb 28;13:135.,  doi: 10.3389/fnins.2019.00135. 

2. A.G. Thomspson et al, “Network Analysis of the CSF Proteome Characterizes  Convergent Pathways of Cellular Dysfunction in ALS”, Front Neurosci. 2021 Mar  17;15:642324. doi: 10.3389/fnins.2021.642324 

3. F. Faghri, et al. “Identification and prediction of ALS subgroups using machine  learning”, medRxiv 2021.04.02.21254844, doi:  https://doi.org/10.1101/2021.04.02.21254844.

4. E. Amiri Souri, A. Chenoweth, A. Cheung, S. N. Karagiannis, S. Tsoka, “Cancer Grade  Model: a multi-gene machine learning-based risk classification for improving  prognosis in breast cancer”, Br J Cancer. 2021 Aug;125(5):748-758. doi:  10.1038/s41416-021-01455-1. 

5. Bennett, L., Kittas, A., Muirhead, G., Papageorgiou, L. G., Tsoka, S. Detection of  composite communities in multiplex biological networks. Sci. Rep. 5, 10345, 2015.