Developing deep learning models to predict youth mental health problems from parents’ speech

Lead Supervisor
Dr Helen Fisher
Reader in Developmental Psychopathology
Social Genetic and Developmental Psychiatry Centre

Dr Johnny Downs – Clinical Senior Lecturer in Child and Adolescent Psychiatry, IoPPN, King’s College London
Dr Heidi Christensen – Senior Lecturer in Computer Science, Speech and Hearing Research Group (SPandH) University of Sheffield. Theme lead UKRI CDT Speech and Language Technologies.

South London and Maudsley NHS Foundation Trust; Dalhousie University, Halifax, Canada; University of Bristol; University of Sheffield

Project Details

Youth mental health is a key determinant of human potential and instrumental in building healthy societies. Predicting mental health is the first step towards interventions that can improve it and allow young people (and the society they live in) to flourish. Substantial evidence suggests that youth mental health can be usefully predicted from as little as five minutes of audio of a parent speaking about their child. Brief speech samples are easy to collect and have rich informational content because both the words and sound of speech convey facts about the parent, the offspring, and the interactions within the family. Yet, this method of assessment is rarely used, because the coding of speech is laborious, bias-prone, and requires highly trained raters.

The aim of this project is to develop an automated machine-learning-based method of assessing emotional attitudes from parent speech samples in a way that is efficient, reproducible, and minimises biases, and utilise this to predict children’s future mental health outcomes. This research cuts across the health, engineering, and social sciences, and the candidate will join a thriving interdisciplinary team with expertise in developmental and social psychology, epidemiology, psychiatry, computer science, digital ethics, clinical informatics, linguistics, signal processing, and natural language processing. In this project we will leverage a combination of unique cohort study data and expertise in social science, mental health, and automated speech analyses. 

The main research questions that will be examined within this PhD are:

  1. Can a deep learning model be developed to detect emotional attitudes of parents towards their
  2. children from brief samples of speech with comparable accuracy to highly trained human raters?
  3. Does the automated model perform well across sexes, socio-economic strata, accents, and countries?
  4. Can the model accurately predict which children will develop mental health problems in the future?
  5. What are the ethical, social, and practical challenges to developing and implementing such models?

In addition the candidate will work with key stakeholders (parents, healthcare and social-work practitioners and policy-makers, and educators) throughout the project to determine the main ethical, social and practical challenges to using and sharing parental speech data and develop the automated models so that these can inform the approach taken within the project.


The project uses data from the Environmental Risk (E-Risk) Longitudinal Twin Study, which is a unique UK cohort of parents and offspring, which  tracks the development of a nationally representative birth cohort of 2,232 British twin children born in England and Wales in 1994-1995.41 The sample was constructed in 1999-2000, when 1,116 families with same-sex 5-year-old twins (93% of those eligible) participated in home-visit assessments. Families were recruited to represent the UK population of families with new-borns in the 1990s, based on residential location throughout England and Wales and mothers’ age.

Risk families are representative of UK households across the spectrum of neighbourhood level deprivation. The sample comprises 56% monozygotic and 44% dizygotic twin pairs, and sex is evenly distributed within zygosity (49% male). Speech samples which capture expressed emotion have been serially recorded and manually coded during follow-up home-visits when children were aged 7, 10, 12, and 18 years (participation rates of 98%, 96%, 96%, and 93%, respectively). Dr Fisher is lead applicant on E-Risk and will oversee access to the data required for the PhD.


Speech analysis, Child Development, Expressed Emotion, Machine learning