Statistical speaker diarization using dependent combination of extracted features

Hasan Almgotir Kadhim*, Lok Woo, Satnam Dlay

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The paper describes a novel method that improvises the procedure for supervised speaker diarization. The procedure supposes that the database of the speakers is available. Initially, the database and observation signal of the speakers, are prepared. The audio features has been extracted from the database and the observation signal. Instead of the using of one of Mel Frequency Cepstral Coefficient, Perceptual Linear Prediction, or Power Normalized Cepstral Coefficients, a combination of all of them have been used. The combination form of these features is independent, i.e. They are concatenated in the feature matrix. The comparison between features of observation signal and statistical properties of database features, has been made. The comparing procedure is used to make the decision of the logical mask of the comparison. Both of bottom-up and top-down scenarios collaborate to complete the last decisions successfully. Diarization Error Rate test denotes that combination of features has less than errors than any one alone.

Original languageEnglish
Title of host publicationProceedings - AIMS 2015, 3rd International Conference on Artificial Intelligence, Modelling and Simulation
EditorsMohd Hanafi Ahmad Hijazi, Ismail Saad, David Al-Dabass, Nurmin Bolong
PublisherIEEE
Pages291-296
Number of pages6
ISBN (Electronic)9781467386753
ISBN (Print)9781467386760
DOIs
Publication statusPublished - 24 Oct 2016
Event3rd International Conference on Artificial Intelligence, Modelling and Simulation, AIMS 2015 - Kota Kinabalu, Sabah, Malaysia
Duration: 2 Dec 20154 Dec 2015

Conference

Conference3rd International Conference on Artificial Intelligence, Modelling and Simulation, AIMS 2015
CountryMalaysia
CityKota Kinabalu, Sabah
Period2/12/154/12/15

Fingerprint Dive into the research topics of 'Statistical speaker diarization using dependent combination of extracted features'. Together they form a unique fingerprint.

Cite this