Linguistic profiles of hidden communities: A morphosyntactic aspect
Mamaev Ivan Dmitrievich
Baltic State Technical University “Voenmekh” named after D. F. Ustinov; Saint Petersburg State University
Submitted: 23.02.2024
Abstract. The aim of the research is to identify quantitative regularities in the functioning of morphosyntactic parameters in the texts by users of hidden online communities. Through statistical methods, the paper attempts to confirm the “cohesion” of the main morphosyntactic features, the information about which was obtained using the Profiling-UD linguistic processor. The scientific novelty of the research lies in the following: based on a corpus of Russian-language social media texts, an experiment is conducted on the correlation analysis of morphosyntactic characteristics, which could become part of the future linguistic profile of hidden communities. Such profiles could be used in modern social media to enhance the functionality of recommendation systems. As a result, the research found that significant positive correlations with moderate statistical significance were identified for over 55% of hidden communities. By applying the proposed methodology, the linguistic profile of hidden communities can be further expanded with syntactic and lexical parameters, allowing for cluster analysis of communities and identification of the homogeneity/heterogeneity of the use of the characteristics across different linguistic levels in user posts from hidden communities.
Key words and phrases: лингвистическое профилирование, корпус русскоязычных социальных сетей, морфосинтаксические характеристики постов, скрытые сообщества, linguistic profiling, corpus of Russian-language social media, morphosyntactic characteristics of posts, hidden communities
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Bodrova T., Tukmakova N. Opredelenie koeffitsienta rangovoi korrelyatsii chastei rechi v russkikh i chuvashskikh gazetnykh tekstakh // Movoznavchii vіsnik. 2012. № 14-15.
Konyushkevich M. Preobrazovanie predlozhno-padezhnoi sintaksemy v predikativnuyu edinitsu: korrelyatsiya predloga i pokazatelya svyazi slozhnogo predlozheniya // Lіngvіstichnі studії. 2013. № 26.
Kornienko E. R. Idiolekt i idiostil': k voprosu o sootnesenii ponyatii // Filologiya: nauchnye issledovaniya. 2019. № 1.
Mamaev I. D., Mitrofanova O. A. Lingvisticheskie parametry dlya identifikatsii skrytykh setevykh soobshchestv // Terra Linguistica. 2024. T. 15. № 1.
Martynenko G. Ya., Grebennikov A. O. Osnovy stilemetrii: ucheb.-metod. posobie. SPb.: Izd-vo S.-Peterb. un-ta, 2018.
Potebnya A. A. Iz zapisok po russkoi grammatike: v 4-kh t. M.: Uchpedgiz, 1958. T. 1-2.
Russkaya grammatika / gl. red. N. Yu. Shvedova. M.: Nauka, 1980. T. 1. Fonetika. Fonologiya. Udarenie. Intonatsiya. Slovoobrazovanie. Morfologiya.
Tukmakova N. P. Opredelenie koeffitsienta vzaimnoi sopryazhennosti v russkikh i chuvashskikh gazetnykh tekstakh // Filologicheskie nauki. Voprosy teorii i praktiki. 2020. T. 13. Vyp. 7.
Khokhlova M. V., Rubiner V. I. K voprosu o kolichestvennom analize predlozhno-padezhnykh sochetanii v russkom yazyke na primere zakonodatel'nykh tekstov // Korpusnaya lingvistika – 2019: trudy mezhdunarodnoi konferentsii. SPb., 2019.
Baumes J., Goldberg M., Magdon-Ismail M., Wallace W. A. Discovering hidden groups in communication networks // International Conference on Intelligence and Security Informatics. Berlin – Heidelberg: Springer Berlin Heidelberg, 2004.
Brunato D., Cimino A., Dell’Orletta F., Venturi G., Montemagni S. Profiling-UD: A tool for linguistic profiling of texts // Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, 2020.
Curtotti M., McCreath E. C. A corpus of Australian Contract Language: Description, profiling and analysis // Proceedings of the 13th International Conference on Artificial Intelligence and Law. 2011. http://dx.doi.org/10.2139/ssrn.2304652
Hengeveld K. Parts-of-speech systems and morphological types // ACLC Working Papers. 2007. Vol. 2.
Lilliefors H. W. On the Kolmogorov-Smirnov test for normality with mean and variance 10 // Journal of the American Statistical Association. 1967. Vol. 62. No. 318.
Litvinova T., Sboev A., Panicheva P. Profiling the age of Russian bloggers // Conference on Artificial Intelligence and Natural Language. Cham: Springer International Publishing, 2018.
Mishra N., Schreiber R., Stanton I., Tarjan R. E. Clustering social networks // International Workshop on Algorithms and Models for the Web-Graph. Berlin – Heidelberg: Springer Berlin Heidelberg, 2007.
Panicheva P., Litvinova T. Authorship attribution in Russian in real-world forensics scenario // International Conference on Statistical Language and Speech Processing. Cham: Springer International Publishing, 2019.