Abstract
To simplify the jobs of speaker diarization and speech separation, at first, speech signal should be segregated to two speech formats, dialog and mixture. This paper describes a new algorithm which achieves that first step efficiently. The algorithm is based on Perceptual Linear Predictive feature extraction, optimized k-means and both top-down & bottom-up scenarios. After extracting features of the observation signal, k-means clusters the statistical properties such as variances of the PDF (histogram) of clustered extracted features. k-means is optimized by discounting the worst pattern of clustering step through doing the k-means procedure twice. The feedback loop is necessary for the guiding of the optimized k-means by exploiting the attributes of ordinary k-means. The results of segregation are excellent. The calculated diarization error rate of outputs is very limited.
Original language | English |
---|---|
Title of host publication | Proceedings of 2015 IEEE International Conference on Progress in Informatics and Computing, PIC 2015 |
Editors | Liang Xiao, Yinglin Wang |
Publisher | IEEE |
Pages | 286-291 |
Number of pages | 6 |
ISBN (Electronic) | 9781467390880 |
ISBN (Print) | 9781467380867 |
DOIs | |
Publication status | Published - 13 Jun 2016 |
Externally published | Yes |
Event | 3rd IEEE International Conference on Progress in Informatics and Computing, PIC 2015 - Nanjing, China Duration: 18 Dec 2015 → 20 Dec 2015 |
Conference
Conference | 3rd IEEE International Conference on Progress in Informatics and Computing, PIC 2015 |
---|---|
Country/Territory | China |
City | Nanjing |
Period | 18/12/15 → 20/12/15 |
Keywords
- bottom-up scenario
- clustering
- dierization error rate
- k-means
- perceptual linear predictive
- segmentation
- speaker diarization
- speech segregation
- speech separation
- top-down scenario