Analysis of the effectiveness of ML algorithms for emotion recognition, taking into account prosodic and spectral features
Zavrumov Zaur Aslanovich, Goncharova Oksana Vladimirovna, Levit Alina Aleksandrovna
Pyatigorsk State University
Submitted: 01.05.2024
Abstract. The aim of the study is to determine the optimal classifier for identifying an emotional state based on the results of a comparative analysis of the effectiveness of various machine learning algorithms based on a combination of prosodic and spectral features. The scientific novelty consists in the application of ML algorithms in the recognition of emotionally marked speech of North Caucasian bilinguals in the problem of binary classification of the presence or absence of an accent with the determination of the optimal combination of universal prosodic and spectral features. During the study, an experimental corpus of speech of representatives of three ethnic groups (Russians, Kabardians and Armenians) was created with an annotation of the degree of accent, prosodic (94 signs) and spectral (74 signs) characteristics were extracted from speech signals, a comparative analysis of the effectiveness of machine learning algorithms (logistic regression, k-nearest neighbors, the method of support vectors, decision trees) in the problem of binary classification of the presence/absence of emphasis. The results of the study showed that at the syllabic level, the most effective is the decision tree model with combined features, and at the phrasal level, the k-nearest neighbor model with prosodic features. Universal prosodic features that form the basis of the "language model of emotions" were identified, as well as typological differences in their implementation, reflecting the influence of the native language on the emotional speech of bilinguals.
Key words and phrases: языковая модель эмоций, идентификация эмоционального состояния, алгоритмы машинного обучения, просодические и спектральные признаки в речи билингва, распознавание акцента в речи билингва, language model of emotions, identification of emotional state, machine learning algorithms, prosodic and spectral features in bilingual speech, accent recognition in bilingual speech
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Anashkina I. A. Zvuchashchii tekst v aspekte kul'turnoi aksiologii / M-vo obshch. i prof. obrazovaniya RF. Mord. gos. ped. in-t im. M. E. Evsev'eva. Saransk: Mord. gos. ped. in-t im. M. E. Evsev'eva, 1998.
Astakhov D. A., Kataev A. V. Ispol'zovanie sovremennykh algoritmov mashinnogo obucheniya dlya zadachi raspoznavaniya emotsii // Cloud of science. 2018. № 4.
Bogdanova D. R., Akushev A. T. Raspoznavanie emotsii po rechevomu signalu // E-Scio. 2021. № 6 (57).
Vishnevskaya G. M. Angliiskaya intonatsiya (v usloviyakh russkoi interferentsii): uchebnoe posobie / Ivan. gos. un-t. Ivanovo, 1985.
Vorob'eva O. V. Prosodiya implitsitnogo nesoglasiya v russkoi rechi severokavkazskikh armyanskikh bilingvov: eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2008.
Goncharova O. V. Prosodiya russkogo pobuzhdeniya v usloviyakh kabardino-cherkesskoi interferentsii: eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2008.
Devyatkov V. V., Alfimtsev A. N. Raspoznavanie manipulyativnykh zhestov // Vestnik Moskovskogo gosudarstvennogo tekhnicheskogo universiteta im. N. E. Baumana. Seriya: Priborostroenie. 2007. T. 68. Vyp. 3.
Dubovskii Yu. A., Vorob'eva O. V., Goncharova O. V., Mart'yanova E. O., Sadovaya A. E., Shishimer L. F. Russkaya prosodiya na Severnom Kavkaze: v 2-kh t. / pod obshch. red. Yu. A. Dubovskogo; Federal'noe agentstvo po obrazovaniyu; Pyatigorskii gosudarstvennyi lingvisticheskii universitet. Pyatigorsk, 2008. T. 1.
Ermakova N. A. Prosodiya russkogo vosklitsaniya v usloviyakh osetinskoi interferentsii: Eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2006.
Kanter L. A. Sistemnyi analiz rechevoi intonatsii. M.: Vysshaya shkola, 1988.
Kipa E. V. Prosodiya russkogo obshchego voprosa v usloviyakh kabardino-cherkesskoi interferentsii: eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2003.
Lavrent'eva N. G. Osobennosti russko-angliiskoi interferentsii primenitel'no k aktsento-ritmicheskoi organizatsii angliiskoi rechi // Sovremennyi bilingvizm: teoreticheskie i prikladnye aspekty: mezhvuz. sb. nauch. tr. / pod red. G. M. Vishnevskoi. Ivanovo, 2008.
Lukova N. V. Prosodiya russkogo spetsial'nogo voprosa v usloviyakh grecheskoi interferentsii: Eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2004.
Mart'yanova E. O. Prosodiya russkogo vosklitsaniya v usloviyakh Karachaevo-Balkarskoi interferentsii: eksperimental'no-foneticheskoe issledovanie na materiale replik s modal'nost'yu voskhishcheniya: diss.. k. filol. n. Pyatigorsk, 2006.
Sadovaya A. E. Prosodicheskie cherty obrashcheniya v russkoi rechi severokavkazskikh armyanskikh bilingvov: eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2003.
Svetozarova N. D. Intonatsionnaya sistema russkogo yazyka. SPb.: Izd-vo Sankt-Peterburgskogo un-ta, 2006.
Sokolova M. A., Gintovt K. P., Tikhonova I. S., Tikhonova R. M. Teoreticheskaya fonetika angliiskogo yazyka. M.: Vyssh. shk., 1991.
Trubetskoi N. S. Osnovy fonologii. M.: URSS, 2012.
Fomichenko L. G. Kognitivnye osnovy prosodicheskoi interfrentsii: monografiya. Volgograd: Izd-vo Volgogradskogo un-ta, 2005.
Shishimer L. F. Prosodiya russkoi otvetnoi repliki v usloviyakh kabardino-cherkesskoi interferentsii: eksperimental'no-foneticheskoe issledovanie: diss. ... k. filol. n. Pyatigorsk, 2003.
Bolinger D. A theory of pitch accent in English // Word. 1958. Vol. 14.
Cowie R., Douglas-Cowie E., Tsapatsoulis N., Votsis G., Kollias S., Fellenz W., and Taylor J. G. Emotion recognition in human-computer interaction // IEEE Signal Processing Magazine. 2001. Vol. 18. № 1.
Ekman P. Universals and cultural differences in facial expressions of emotion. Nebraska symposium on motivation, University of Nebraska Press, 1971.
Liu L., Wei L., Morris Sh., Zhuang M. Knowledge-Based Features for Speech Analysis and Classification: Pronunciation Diagnoses // Electronics. 2023. № 12 (9): 2055. URL: https://doi.org/10.3390/electronics12092055.
McGilloway S., Cowie S., Douglas-Cowie E., Gielen S., Westerdijk M., Stroeve S. Approaching automatic recognition of emotion from voice: A Rough benchmark // Proc. ISCA Workshop on Speech and Emotion. 2000. January.
Pike K. The intonation of American English // University of Michigan Publications. Linguistics, 1. Greenwood Press, 1979.
Shan C., Gong Sh., McOwan Peter W. Facial expression recognition based on Local Binary Patterns: A Comprehensive study // Image and Vision Computing. 2009. № 27.
Yi J., Mao X., Chen L., Xue Y., Compare A. Facial expression recognition considering individual differences in facial structure and texture // IET Computer Vision. 2014. Vol. 8. Iss. 5. DOI: 10.1049/iet-cvi.2013.0171.