Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments

Mouad Djellab, Abderrahmane Amrouche, Ahmed Bouridane, Noureddine Mehallegue

Research output: Contribution to journalArticlepeer-review

20 Citations (Scopus)

Abstract

The Algerian linguistic situation is very intricate due to the ethnic, geographical and colonial occupation influences which have lead to a complex sociolinguistic environment. As a result of the contact between different languages and accents, the Algerian speech community has acquired a distinctive sociolinguistic situation. In addition to the intra- and inter- lingual variations describing day-to-day linguistic behavior of the Algerian speakers, their speech is characterized by the presence of many linguistic phenomena such as bilingualism and code switching. The study of automatic regional accent recognition in such a type of environment is a new idea in the field of automatic languages, dialect and accent recognition especially that previous studies were conducted using monolingual evaluation data. The assessment of the effectiveness of GMM-UBM and i-vectors frameworks for accent recognition approaches through the use of the Algerian Modern Colloquial Arabic Speech Corpus (AMCASC), which is a linguistic resource collected for this purpose, shows that not only the recording conditions mismatch, channels mismatch, recordings length mismatch and the amplitude clipping which have a non-desirable effect on the effectiveness of these acoustic approaches but also language contact phenomena are other perturbation sources which should be taken into consideration especially in real life applications.
Original languageEnglish
JournalLanguage Resources and Evaluation
DOIs
Publication statusE-pub ahead of print - 11 Mar 2016

Fingerprint

Dive into the research topics of 'Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments'. Together they form a unique fingerprint.

Cite this