Audio classification using attention-augmented convolutional neural network

Yu Wu, Hua Mao, Zhang Yi

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)
284 Downloads (Pure)

Abstract

Audio classification, as a set of important and challenging tasks, groups speech signals according to speakers’ identities, accents, and emotional states. Due to the high dimensionality of the audio data, task-specific hand-crafted features extraction is always required and regarded cumbersome for various audio classification tasks. More importantly, the inherent relationship among features has not been fully exploited. In this paper, the original speech signal is first represented as spectrogram and later be split along the frequency domain to form frequency-distributed spectrogram. This paper proposes a task-independent model, called FreqCNN, to automaticly extract distinctive features from each frequency band by using convolutional kernels. Further more, an attention mechanism is introduced to systematically enhance the features from certain frequency bands. The proposed FreqCNN is evaluated on three publicly available speech databases thorough three independent classification tasks. The obtained results demonstrate superior performance over the state-of-the-art.
Original languageEnglish
Pages (from-to)90-100
Number of pages11
JournalKnowledge-Based Systems
Volume161
Early online date26 Jul 2018
DOIs
Publication statusPublished - 1 Dec 2018

Cite this