Back to publications

Uncertainty as a predictor: Leveraging self-supervised learning for zero-shot mos prediction

Paper Details

Published: 2024/04/30

Pages: 580-584

DOI: 10.1109/ICASSPW62465.2024.10626267

Container Title: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Predicting audio quality in voice synthesis and conversion systems is a critical yet challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are cumbersome to collect at scale. This paper addresses the gap in efficient audio quality prediction, especially in low-resource settings where extensive MOS data from large-scale listening tests may be unavailable. The authors demonstrate that uncertainty measures derived from out-of-the-box pre-trained self-supervised learning (SSL) models, such as wav2vec, correlate with MOS scores. The paper explores the extent of this correlation across different models and language contexts, revealing insights into how inherent uncertainties in SSL models can serve as effective proxies for audio quality assessment.

Authors

E. Cooper

J. Yamagishi