In this article, I-vector Speaker Identification (SID) is exploited as a compact, low dimension, fixed length and modern state of the art system. The main structures for this study consist of four combinations of features which depend on Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features, with two different compensation approaches which have been previously proposed. The main system is modelled by I-vectors with low dimensions, and we also propose fusion strategies with different higher I-vector dimensions to improve the recognition rate. In addition, cumulative, concatenated, and interleaved fusion techniques are investigated to improve the conventional late fusion presented in our previous work. Moreover, the proposed system employs an Extreme Learning Machine (ELM) for classification purpose, which is efficient, less complex and less time consuming compared with traditional neural network based approaches. The system is evaluated on the TIMIT database for clean and AWGN environments and achieved a recognition rate of 96.67% and 80.83% respectively. The system shows improvements compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) in our previously proposed scheme, with an improvement of 1.76% in clean speech and 2.1% for 30dB AWGN and with the highest improvement at 10dB with 43.81%.
|Title of host publication||2017 Intelligent Systems Conference (IntelliSys)|
|Number of pages||6|
|Publication status||Published - 26 Mar 2018|
|Event||2017 Intelligent Systems Conference, IntelliSys 2017 - London, United Kingdom|
Duration: 7 Sep 2017 → 8 Sep 2017
|Conference||2017 Intelligent Systems Conference, IntelliSys 2017|
|Period||7/09/17 → 8/09/17|