Back to publications
Moving beyond word error rate to evaluate automatic speech recognition in clinical samples: Lessons from research into schizophrenia-spectrum disorders
Paper Details
Published: 2025/08/25
Journal: Psychiatry Research
Volume: Volume 352
Number: 116690
DOI: 10.1016/j.psychres.2025.116690
Paper Links
HTMLNatural language processing applications to mental health research depend on automatic speech recognition (ASR) to study large samples and develop scalable clinical tools. To ensure safe and effective implementation, it is crucial to understand performance patterns of ASR for speech from clinical populations. Therefore, this study evaluated ASR performance in N=50 speech samples from individuals with schizophrenia-spectrum disorders, identifying word error rates (WER) ranging from 0.31 to 0.58. Different WER showed systematic variations based on country of birth and severity of positive symptoms. In subsequent NLP analysis, ASR transcripts showed significantly higher GloVe semantic similarity and fewer sentences than manual transcripts as well as weaker correlations between NLP metrics and symptom scores. We considered the potential impact of these differences in three real-world use cases of ASR: electronic health records, voice chatbots, and clinical decision support systems. Overall, we argue that assessing ASR performance requires looking beyond WER alone. In clinical settings, the potential impact of an ASR error is not only influenced by its rate but by its type, meaning and context. Our approach provides guidance on how to evaluate ASR in clinical research, offering guidance for future researchers and developers on key considerations for its implementation.
Authors
Shrankhla Pandey
University of Cambridge
PhD student
Sarah Morgan
Cambridge University
Departmental Early Career Academic Fellow, Accelerate Programme
Sandra Anna Just
Brita Elvevåg
Ivan Nenchev
Anna-Lena Bröcker