TY - JOUR
T1 - Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments
AU - Djellab, Mouad
AU - Amrouche, Abderrahmane
AU - Bouridane, Ahmed
AU - Mehallegue, Noureddine
PY - 2016/3/11
Y1 - 2016/3/11
N2 - The Algerian linguistic situation is very intricate due to the ethnic, geographical and colonial occupation influences which have lead to a complex sociolinguistic environment. As a result of the contact between different languages and accents, the Algerian speech community has acquired a distinctive sociolinguistic situation. In addition to the intra- and inter- lingual variations describing day-to-day linguistic behavior of the Algerian speakers, their speech is characterized by the presence of many linguistic phenomena such as bilingualism and code switching. The study of automatic regional accent recognition in such a type of environment is a new idea in the field of automatic languages, dialect and accent recognition especially that previous studies were conducted using monolingual evaluation data. The assessment of the effectiveness of GMM-UBM and i-vectors frameworks for accent recognition approaches through the use of the Algerian Modern Colloquial Arabic Speech Corpus (AMCASC), which is a linguistic resource collected for this purpose, shows that not only the recording conditions mismatch, channels mismatch, recordings length mismatch and the amplitude clipping which have a non-desirable effect on the effectiveness of these acoustic approaches but also language contact phenomena are other perturbation sources which should be taken into consideration especially in real life applications.
AB - The Algerian linguistic situation is very intricate due to the ethnic, geographical and colonial occupation influences which have lead to a complex sociolinguistic environment. As a result of the contact between different languages and accents, the Algerian speech community has acquired a distinctive sociolinguistic situation. In addition to the intra- and inter- lingual variations describing day-to-day linguistic behavior of the Algerian speakers, their speech is characterized by the presence of many linguistic phenomena such as bilingualism and code switching. The study of automatic regional accent recognition in such a type of environment is a new idea in the field of automatic languages, dialect and accent recognition especially that previous studies were conducted using monolingual evaluation data. The assessment of the effectiveness of GMM-UBM and i-vectors frameworks for accent recognition approaches through the use of the Algerian Modern Colloquial Arabic Speech Corpus (AMCASC), which is a linguistic resource collected for this purpose, shows that not only the recording conditions mismatch, channels mismatch, recordings length mismatch and the amplitude clipping which have a non-desirable effect on the effectiveness of these acoustic approaches but also language contact phenomena are other perturbation sources which should be taken into consideration especially in real life applications.
U2 - 10.1007/s10579-016-9347-6
DO - 10.1007/s10579-016-9347-6
M3 - Article
SN - 1574-020X
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
ER -