Experimental Database Modelling of a Balanced Linguistic Corpus
Gorozhanov Alexey Ivanovich
Moscow State Linguistic University
Submitted: 04.09.2022
Abstract. The research aims to build a functioning experimental model of a relational database for operating with a balanced linguistic corpus of a fiction work. Scientific novelty lies in the fact that for the first time within the framework of a humanities study, a database of a linguistic corpus is being modeled with a thorough description and taking into account technical details and based on the provisions of the author’s concept of professionally oriented programming. The work involved three stages: forming a technical task (the structure of two tables of a relational database was developed, the SQLite format was selected, additional columns of the tables were provided for the subsequent expansion of the content of research), writing the source code for creating and filling the database (the Python programming language and the spaCy natural language processing module were used) and testing it based on the material of the texts of three F. Kafka’s novels “The Castle”, “Amerika” and “The Trial” (three functioning databases were created). The research findings have shown that modern natural language processing software tools allow one to create automatically full-fledged databases for processing SQL queries, which can be further expanded manually or automatically.
Key words and phrases: реляционная база данных, корпусная лингвистика, профессионально ориентированное программирование, SQLite, spaCy, relational database, corpus linguistics, professionally oriented programming
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Gorozhanov A. I., Guseinova I. A. Prikladnye aspekty analiza i interpretatsii tekstov (na materiale nemetskogo i russkogo yazykov). Kazan': Buk, 2021.
Lesnikov S. V. Formirovanie gipertekstovogo korpusa uchebnykh slovarei russkogo yazyka // Filologicheskie nauki. Nauchnye doklady vysshei shkoly. 2021. № 4. DOI: 10.20339/PhS.4-21.027
Pisarik O. I. Printsipy razrabotki bazy dannykh pod"yazyka predmetnoi oblasti «Stroitel'stvo» // Vestnik Moskovskogo gosudarstvennogo lingvisticheskogo universiteta. Gumanitarnye nauki. 2021. № 5 (847). DOI: 10.52070/2542-2197_2021_5_847_150
Khokhlova M. V. Atributivnye kollokatsii v zolotom standarte sochetaemosti russkogo yazyka i ikh predstavlenie v slovaryakh i korpusakh tekstov // Voprosy leksikografii. 2021. № 21. DOI: 10.17223/22274200/21/2
Ayre K., Bittar A., Kam J., Verma S., Howard L. M., Dutta R. Developing a Natural Language Processing Tool to Identify Perinatal Self-Harm in Electronic Healthcare Records // PLoS ONE. 2021. No. 16 (8). DOI: 10.1371/journal.pone.0253809
Gorozhanov A. I., Guseynova I. A. Programming for Specific Purposes in Linguistics: A New Challenge for the Humanitarian Curricula // Training, Language and Culture. 2020. Vol. 4. No. 4. DOI: 10.22363/2521-442X-2020-4-4-23-38
Jugran S., Kumar A., Tyagi B. S., Anand V. Extractive Automatic Text Summarization Using SpaCy in Python NLP // 2021 International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2021. Greater Noida, 2021. DOI: 10.1109/ICACITE51222.2021.9404712
Mizrahi M., Dickinson M. A. Philosophical Reasoning about Science: A Quantitative, Digital Study // Synthese. 2022. Vol. 200. No. 2. DOI: 10.1007/s11229-022-03670-6
Okhapkin V. P., Okhapkina E. P., Iskhakova A. O., Iskhakov A. Y. Constructing of Semantically Dependent Patterns Based on SpaCy and StanfordNLP Libraries // Communications in Computer and Information Science (in Books). 2021. Vol. 1395. DOI: 10.1007/978-981-16-1480-4_45
Verma A., Sikarvar V., Yadav H., Jaganathan R., Kumar P. Shabd: A Psycholinguistic Database for Hindi // Behavior Research Methods. 2022. Vol. 54. No. 2. DOI: 10.3758/s13428-021-01625-2