Automated patient summarisation in electronic health records data

Lead Supervisor
Professor Richard Dobson
Professor of Medical Bioinformatics
Biostatistics and Health Informatics, King’s College London

Dr Dan Bean (King’s College London)

Project Details

Electronic health records contain detailed medical information in many formats but free text remains the major source of data. State of the art natural language processing algorithms are now able to extract much of this information with a usable degree of accuracy. For each document output of these NLP methods is a large set of identifiers for specific medical concepts in an ontology e.g. SNOMED. Every interaction a patient has with a hospital produces numerous documents as the patients medical state changes over time. As a clinician reviews a patients records they come to an overall view of the patients medical history and current status (e.g. diagnoses, medications), including both explicit and implicit information. For example, a patient explicitly prescribed insulin is likely to be diabetic even if this is not explicitly recorded. The aim of this project is to implement this patient summarisation process using a combination of graph databases and machine learning.

By structuring our knowledge of a patients records as a graph, we can link specific concepts to general medical knowledge, such as kidney failure is a condition of the renal system or insulin is prescribed for diabetes. Using these relationships, we can identify cases where our knowledge of a patient entails certain facts that are not explicit. However, the vast amount of potential inferences we could make means it is not feasible to manually identify them. Therefore, we will also use machine learning to supplement these manual cases, enabling us to derive a summary of a patient at any time in their clinical history. This summary can either be presented to an end user directly (e.g. for an audit or trial recruitment) or used as input to further analysis (e.g. predicting disease risk). 


This project can be carried out using entirely open access data (MIMIC-III). If use-cases in hospitals are identified we will apply for access through the relevant local ethics committee (CRIS for South London and Maudsley, KERRI for King’s College Hospital). 


Electronic health records, machine learning, knowledge graphs