Master's Thesis: Minimum Word Error Rate Training for Speech Separation

June 30, 2019

Master's ThesisSpeech SeparationASRWeighted Loss

Minimum Word Error Rate Training for Speech Separation

This post summarizes my M.S. thesis work at the University of Stavanger.

Title: Minimum Word Error Rate Training for Speech Separation
Supervisor: Tomasz Wiktorski
NVA record: https://nva.sikt.no/registration/019903f37e7a-54d82b2c-62ec-4df4-bbeb-cfb25fd257a6
Project code and demo: https://github.com/thejungwon/mwetss

The key idea was to modify the conventional speech separation objective with a WER-guided weighted loss:

Start from the standard speech separation loss.
Use a pre-trained speech recognition model to estimate recognition quality.
Apply WER-based weighting so the separation model emphasizes samples/regions that matter more for downstream recognition.
Train the separator with this weighted objective instead of a purely signal-level loss.

In short, the thesis tied separation training more directly to recognition outcomes, not only waveform/spectrogram similarity.