Identification of “toxicity” in social networks based on the semantic proximity criterion
Kurganskaia Ekaterina Vladimirovna, Stepanova Natalia Valentinovna
Saint Petersburg Electrotechnical University “LETI”
Submitted: 26.02.2024
Abstract. The aim of the research is to check the effectiveness of the method of automatic identification of “toxic” comments of users in social networks based on semantic proximity. The article carries out a linguistic analysis of examples of “toxic” behavior, defines the criteria of “toxicity” and the main lexical and stylistic features of “toxic” texts. The analysis of the latest works on the topic gives a general idea of the current methods of identifying “toxicity”. A solution for identifying “toxic” comments based on the idea of the lack of semantic proximity between the text of the post and the “toxic” comment is tested. The scientific novelty lies in the fact that the work proposes for the first time to use the criterion of semantic proximity to identify “toxic” comments, which is a fairly simple and effective solution. Moreover, such studies have not been conducted earlier within the framework of the most popular Russian-language social network VKontakte. As a result of the research, it was found that determining the semantic proximity between a post and a comment is a fairly effective way to determine the relevance of a comment and, consequently, its probable “toxic” connotation. It was also found that the cosine similarity metric is suitable for conducting experiments to identify “toxicity”, but to improve the results, it can be supplemented with other machine learning methods.
Key words and phrases: токсичность в социальных сетях, релевантность комментариев, семантическая близость, векторные вложения слов, toxicity in social networks, relevance of comments, semantic proximity, word vector embeddings
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Arutyunova N. D. Diskurs // Lingvisticheskii entsiklopedicheskii slovar' / otv. red. V. N. Yartseva. M.: SE, 1990.
Buryakovskaya V. A., Dmitrieva O. A. Kvazinauchnyi termin «toksichnyi» v sovremennoi blogosfere (na materiale russkogo, angliiskogo i frantsuzskogo yazykov) // Izvestiya Volgogradskogo gosudarstvennogo pedagogicheskogo universiteta. 2022. № 5 (168).
Galichkina E. N. Spetsifika komp'yuternogo diskursa na angliiskom i russkom yazykakh: na materiale zhanra komp'yuternykh konferentsii: diss. … k. filol. n. Astrakhan', 2001.
Gribovod E. G. Diskurs // Diskurs-Pi. 2013. T. 10. № 3.
Efanova A. A., Osokin A. A. Diskurs sotsial'nykh media: k probleme interpretatsii // Voprosy teorii i praktiki zhurnalistiki. 2022. T. 11. № 3.
Ionova S. V. Toksichnyi rukovoditel': lingvoekologiya rechevogo povedeniya // Ekologiya yazyka i kommunikativnaya praktika. 2018. № 4.
Karasik V. I. Zhanry setevogo diskursa // Zhanry rechi. 2019. № 1 (21).
Krasnykh V. V. Etnopsikholingvistika i lingvokul'turologiya: kurs lektsii. M.: Gnozis, 2002.
Lutovinova O. V. Lingvokul'turologicheskie kharakteristiki virtual'nogo diskursa. Volgograd: VGPU; Peremena, 2009.
Ovinova L. N., Shraiber E. G. «Toksichnoe» pedagogicheskoe obshchenie: analiz sostoyaniya, prichiny i priznaki // Vestnik Yuzhno-Ural'skogo gosudarstvennogo universiteta. Seriya: Obrazovanie. Pedagogicheskie nauki. 2022. T. 14. № 3.
Pavlov M. A. Ponyatie setevogo diskursa v sovremennoi lingvistike // Nauka i obrazovanie: novoe vremya. 2017. № 1.
Platonov E. N., Rudenko V. Yu. Vyyavlenie i klassifikatsiya toksichnykh vyskazyvanii metodami mashinnogo obucheniya // Modelirovanie i analiz dannykh. 2022. T. 12. № 1.
Rusanov E. K. Internet-diskurs v diskursivnoi paradigme // Gumanitarnye yuridicheskie issledovaniya. 2016. № 1.
Ryabova A. S. Lingvisticheskie osobennosti angloyazychnogo diskursa sotsial'nykh setei // Ogarev-Online. 2020. № 6 (143)
Sundiev I. Yu., Smirnov A. A. «Toksichnyi» kontent v seti Internet i ego vliyanie na radikalizatsiyu molodezhi // Nauchnyi portal MVD Rossii. 2020. № 4 (52).
Ushakov A. A. Internet-diskurs kak osobyi tip rechi // Vestnik Adygeiskogo gosudarstvennogo universiteta. Seriya 2: Filologiya i iskusstvovedenie. 2010. № 4.
Yurtaeva E. S. Kharakteristiki virtual'noi yazykovoi lichnosti v kommunikativnom prostranstve Internet-diskursa // Inostrannye yazyki v kontekste mezhkul'turnoi kommunikatsii: materialy dokladov VIII mezhdunarodnoi konferentsii. Saratov, 2016.
Aken B. van, Risch J., Krestel R., Löser A. Challenges for Toxic Comment Classification: An In-Depth Error Analysis // Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) / ed. by D. Fišer, R. Huang, V. Prabhakaran, R. Voigt, Z. Waseem, J. Wernimont. Brussels, 2018. https://doi.org/10.18653/v1/W18-5105
Andrusyak B., Rimel M., Kern R. Detection of Abusive Speech for Mixed Sociolects of Russian and Ukrainian Languages // Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2018. Karlova Studánka, 2018.
Bakarov A., Gureenkova O. Automated Detection of Non-Relevant Posts on the Russian Imageboard “2ch”: Importance of the Choice of Word Representations // Analysis of Images, Social Networks and Texts. AIST 2017 / ed. by W. M. P. van der Aalst, D. I. Ignatov, M. Khachay, S. O. Kuznetsov, V. Lempitsky, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, A. V. Savchenko, S. Wasserman. Cham: Springer, 2017. https://doi.org/10.1007/978-3-319-73013-4_2
Hao L., Weiguan M., Hanyan L. Toxic Comment Detection and Classification. 2018. https://cs229.stanford.edu/proj2019spr/report/71.pdf
Khieu K., Narwal N. Detecting and Classifying Toxic Comments. 2019. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6837517.pdf
Risch J., Krestel R. Toxic Comment Detection in Online Discussions // Deep Learning-Based Approaches for Sentiment Analysis / ed. by Dr. B. Agarwal, Dr. R. Nayak, Dr. N. Mittal, Prof. S. Patnaik. Singapore: Springer, 2020.
Smetanin S. Toxic Comments Detection in Russian // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2020” (Moscow, June 17-20). Moscow, 2020.