Software tools for creating and analyzing a text data bank of short electronic messages from social network users
Loginova Alina Olegovna, Gorozhanov Alexey Ivanovich, Aleynikova Darya Viktorovna
Moscow State Linguistic University
Moscow State Linguistic University; Peoples’ Friendship University of Russia
Submitted: 12.09.2023
Abstract. The research aims at developing an algorithm for creating and analyzing a text data bank of short electronic messages (posts) from social networks using free software tools. The scientific novelty lies in the fact that to solve such a problem, an interdisciplinary approach is used, taking into account the latest achievements of applied and mathematical linguistics and information security, with the involvement of the current regulatory framework. In the course of the work, according to the proposed graphical model, textual research material of ca. 1.5 MB was collected using the Web Scraper plug-in; a text data bank of short electronic messages was generated, converted into a CSV format suitable for further processing; a basic analysis of this data bank was carried out using PolyAnalyst free software package, which included such procedures as the extraction of terms, entities and keywords, sentiment analysis and determination of the subject matter of texts. As a result, the functionality of the created algorithm was proven, prospects for further research were identified – working with big text data and analyzing this data to find destructive content in them.
Key words and phrases: корпусная лингвистика, массив текстовых данных, информационная безопасность, тексты коротких электронных сообщений, деструктивный контент, corpus linguistics, text data bank, information security, texts of short electronic messages, destructive content
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Baranov A. N. Lingvistika v lingvisticheskoi ekspertize (metod i istina) // Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2: Yazykoznanie. 2017. T. 16. № 2. https://doi.org/10.15688/jvolsu2.2017.2.2
Gorozhanov A. I. Sozdanie lingvisticheskogo korpusa na osnove instrumentov obrabotki estestvennogo yazyka: planirovanie programmnykh reshenii // Filologicheskie nauki. Voprosy teorii i praktiki. 2023. T. 16. Vyp. 5. https://doi.org/10.30853/phil20230252
Gorozhanov A. I., Guseinova I. A., Pisarik O. I. Urovnevaya model' informatsionnoi bezopasnosti v usloviyakh virtual'nogo prostranstva // Vestnik MGPU. Seriya: Filologiya. Teoriya yazyka. Yazykovoe obrazovanie. 2022. № 2 (46). https://doi.org/10.25688/2076-913X.2022.46.2.11
Dzhaffarova N. T. Administrativnaya otvetstvennost' za pravonarusheniya v oblasti oborota informatsii: diss. … k. yurid. n. M., 2021.
Loginova A. O., Aleinikova D. V. Vyyavlenie demaskiruyushchikh priznakov sotsial'nogo bota na sintaksicheskom urovne generiruemogo soobshcheniya // Vestnik Voronezhskogo gosudarstvennogo universiteta. Seriya: Sistemnyi analiz i informatsionnye tekhnologii. 2023. № 1. https://doi.org/10.17308/sait/1995-5499/2023/1/139-147
Mamchenko M. V., Meshcheryakov R. V., Galin R. R. Sotsiokiberfizicheskaya sistema dlya vyyavleniya i blokirovaniya destruktivnogo internet-kontenta // Sovremennye problemy radioelektroniki i telekommunikatsii. 2022. № 5.
Minaev V. A., Rebrova A. D., Simonov A. V. Vyyavlenie destruktivnogo kontenta v sotsial'nykh media na osnove modelei mashinnogo obucheniya // Informatsiya i bezopasnost'. 2021. T. 24. № 1.
Potapova R. K., Potapov V. V. Internet-memetika kak emotsiogennaya sreda setevoi kommunikatsii // Izvestiya Rossiiskoi akademii nauk. Seriya literatury i yazyka. 2022. T. 81. № 2. https://doi.org/10.31857/S160578800019458-9
Toktarova V. I., Popova O. G., Sagdullina I. I., Belyanin V. A. Tekhnologii iskusstvennogo intellekta v praktike sovremennogo vysshego obrazovaniya // Vestnik Mariiskogo gosudarstvennogo universiteta. 2023. № 2 (50).
Shulikov K. A. Destruktivnyi kontent: ponyatie, administrativno-pravovaya kharakteristika, vidy // Vestnik Nizhegorodskogo universiteta im. N. I. Lobachevskogo. 2023. № 2.
Islam T., Latif S., Ahmed N. Using Social Networks to Detect Malicious Bangla Text Content // 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). Dhaka, 2019.