Cluster analysis of linguistic profiles of hidden communities
Mamaev Ivan Dmitrievich
Baltic State Technical University “Voenmeh” named after D. F. Ustinov; Saint Petersburg State University
Submitted: 04.05.2024
Abstract. The aim of the study is to present clusters of profiles of hidden communities based on linguistic parameters. The article analyzes the structure and relationships between the attributes of clusters of community profiles. The scientific novelty of the study lies in the fact that the combination of methods of hierarchical cluster analysis of hidden network communities and analysis of variance will reveal the uniformity/heterogeneity of the author's texts created at the grammatical and lexical levels. Using the Ward method, three clusters of linguistic profiles were identified, each of which was given a formal Silhouette Score. A meaningful assessment of the profiles is presented in the form of appropriate linguistic comments. As a result of the study, it was found that online publications are characterized by variation at the level of syntax, but not at the level of morphology. The proposed community clustering approach can be used to identify potentially dangerous online subcultures and opinion leaders in the online space. As a result of the implementation of this approach, linguistic profiles of communities are complemented by digital sociodemographic information.
Key words and phrases: кластерный анализ, скрытые сообщества социальных сетей, лингвистическое профилирование, морфосинтаксические характеристики постов, cluster analysis, hidden communities of social networks, linguistic profiling, morphosyntactic characteristics of posts
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Belousov R. L., Drozhzhin N. A., Kostenchuk M. I. Postroenie nechetkikh lingvisticheskikh peremennykh s ispol'zovaniem metodov klasternogo analiza dannykh // Prikladnaya informatika. 2015. № 1 (55).
Bulyga F. S., Kureichik V. M. Algoritmy aglomerativnoi klasterizatsii primenitel'no k zadacham analiza lingvisticheskoi ekspertnoi informatsii // Izvestiya Yuzhnogo federal'nogo universiteta. Tekhnicheskie nauki. 2021. № 6 (223).
Krylova M. N. Yazyk sovremennogo internet-obshcheniya (na materiale intellektual'nogo kontenta sotsial'noi seti «VKontakte») // Aktual'nye problemy filologii i pedagogicheskoi lingvistiki. 2019. № 1.
Litvinova T. A., Gromova A. V. Komp'yuternye tekhnologii v sudebnoi avtorovedcheskoi ekspertize: problemy i perspektivy ispol'zovaniya // Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2: Yazykoznanie. 2020. T. 19. № 1.
Litvinova T. A., Kotlyarova E. S., Zavarzina V. A. Faktor gendera v assotsiativnykh svyazyakh slov: dannye slovarya i distributivno-semanticheskoi modeli // Nauchnyi dialog. 2022. T. 11. № 5.
Mamaev I. D. Lingvisticheskie profili skrytykh soobshchestv: morfosintaksicheskii aspekt // Filologicheskie nauki. Voprosy teorii i praktiki. 2024. T. 17. Vyp. 4.
Mamaev I. D., Mitrofanova O. A. Lingvisticheskie parametry dlya identifikatsii skrytykh setevykh soobshchestv // Terra Linguistica. 2024. T. 15. № 1.
Mamina T. M. Printsipial'naya mnogoznachnost' informatsii // Vestnik Sankt-Peterburgskogo universiteta. Sotsiologiya. 2014. № 2.
Maslikova O. S. Yazykovye osobennosti obshcheniya v internet-prostranstve // Innovatsionnaya nauka. 2019. № 9.
Nokel' M. A., Lukashevich N. V. Tematicheskie modeli: dobavlenie bigramm i uchet skhodstva mezhdu unigrammami i bigrammami // Vychislitel'nye metody i programmirovanie. 2015. T. 16.
Prokof'eva E. V., Prokof'eva O. Yu. Sravnitel'nyi obzor identifikatsionnykh vozmozhnostei klasternogo, korrelyatsionnogo i strukturno-lingvisticheskogo analiza v raspoznavanii obrazov // Sudebnaya ekspertiza. 2013. № 4.
Savotchenko S. E., Proskurina E. A. Korrelyatsionnyi i dispersionnyi analiz lingvisticheskikh osobennostei poiska v Internete // Srednee professional'noe obrazovanie. 2012. № 12.
Skovorodnikov A. P. O predmete ekolingvistiki primenitel'no k sostoyaniyu sovremennogo russkogo yazyka // Ekologiya yazyka i kommunikativnaya praktika. 2013. № 1.
Stepanenko A. A. Gendernaya atributsiya tekstov komp'yuternoi kommunikatsii: statisticheskii analiz ispol'zovaniya mestoimenii // Vestnik Tomskogo gosudarstvennogo universiteta. 2017. № 415.
Strel'nikov A. I., Vorob'eva M. S. Issledovanie metodov analiza informatsionnoi i leksicheskoi nasyshchennosti nauchnykh tekstov // Matematicheskoe i informatsionnoe modelirovanie: materialy vserossiiskoi konferentsii molodykh uchenykh (g. Tyumen', 18-23 maya 2022 g.) / Ministerstvo nauki i vysshego obrazovaniya RF; Tyumenskii gosudarstvennyi universitet; Institut matematiki i komp'yuternykh nauk; red. koll.: E. P. Vdovin i dr. Tyumen': TyumGU-Press, 2022. Vyp. 20.
Tuliev U. Yu. Klasternyi analiz tekstovykh dokumentov po otnosheniyu ikh svyaznosti // Problemy vychislitel'noi i prikladnoi matematiki. 2019. № 6.
Tyuleneva V. N. Printsipy adaptatsii zaimstvovannoi leksiki v russkom i kitaiskom yazykakh (na primere internet-obzorov elektronnoi tekhniki) // Pedagogicheskoe obrazovanie v Rossii. 2016. № 11.
Brunato D., Cimino A., Dell’Orletta F., Venturi G., Montemagni S. Profiling-UD: A tool for linguistic profiling of texts // Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, 2020.
Chakraborty I., Kim M., Sudhir K. Attribute sentiment scoring with online text reviews: Accounting for language structure and missing attributes // Journal of Marketing Research. 2022. Vol. 59. Iss. 3.
Crystal D. Language and the Internet. Cambridge: Cambridge University Press, 2001.
Demšar J., Zupan B. Orange: Data mining fruitful and fun-a historical perspective // Informatica. 2013. Vol. 37. Iss. 1.
Kekez M. Model-based imputation of sound level data at thoroughfare using computational intelligence // Open Engineering. 2021. Vol. 11. Iss. 1.
Litvinova T., Litvinova O., Panicheva P. Authorship attribution of Russian forum posts with different types of n-gram features // Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval. N. Y., 2019.