TY - JOUR
T1 - Informed single-channel speech separation using HMM-GMM user-generated exemplar source
AU - Wang, Qi
AU - Woo, W. L.
AU - Dlay, S. S.
PY - 2014/12
Y1 - 2014/12
N2 - We present a new approach for solving the single channel speech separation with the aid of an user-generated exemplar source that is recorded from a microphone. Our method deviates from the conventional model-based methods, which highly rely on speaker dependent training data. We readdress the problem by offering a new approach based on utterance dependent patterns extracted from the user-generated exemplar source. Our proposed approach is less restrictive, and does not require speaker dependent information and yet exceeds the performance of conventional model-based separation methods in separating male and male speech mixtures. We combine general speaker-independent (SI) features with specifically generated utterance-dependent (UD) features in a joint probability model. The UD features are initially extracted from the user-generated exemplar source and represented as statistical estimates. These estimates are calibrated based on information extracted from the mixture source to statistically represent the target source. The UD probability model is subsequently generated to target problems of ambiguity and to offer better cues for separation. The proposed algorithm is tested and compared with recent method using the GRID database and the Mocha-TIMIT database.
AB - We present a new approach for solving the single channel speech separation with the aid of an user-generated exemplar source that is recorded from a microphone. Our method deviates from the conventional model-based methods, which highly rely on speaker dependent training data. We readdress the problem by offering a new approach based on utterance dependent patterns extracted from the user-generated exemplar source. Our proposed approach is less restrictive, and does not require speaker dependent information and yet exceeds the performance of conventional model-based separation methods in separating male and male speech mixtures. We combine general speaker-independent (SI) features with specifically generated utterance-dependent (UD) features in a joint probability model. The UD features are initially extracted from the user-generated exemplar source and represented as statistical estimates. These estimates are calibrated based on information extracted from the mixture source to statistically represent the target source. The UD probability model is subsequently generated to target problems of ambiguity and to offer better cues for separation. The proposed algorithm is tested and compared with recent method using the GRID database and the Mocha-TIMIT database.
KW - Concurrent pitch tracking, exemplar assistance
KW - Factorial hiddenMarkovmodel(FHMM)
KW - Gaussian mixturemodel (GMM)
KW - Informed Source Separation (ISS)
KW - Single-channel source separation (SCSS)
KW - Speaker-assisted source separation
U2 - 10.1109/TASLP.2014.2357677
DO - 10.1109/TASLP.2014.2357677
M3 - Article
AN - SCOPUS:84921788585
VL - 22
SP - 2087
EP - 2100
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
SN - 2329-9290
IS - 12
ER -