Inter-rater agreement in annotating text world elements in the TextWorlds corpus
Mikhalkova Elena Vladimirovna
European University at Saint Petersburg
Submitted: 03.08.2024
Abstract. From the perspective of Text World Theory, narratives contain elements (indications of time, place, characters, etc.) that can be automatically identified and compared to establish versions of events and similar plots based on these elements. We have annotated a corpus of fairy tales and short stories, TextWorlds, and discovered that raters do not always agree on whether a particular word refers to a character, time, or place of action. The aim of the research is to determine the degree of inter-rater agreement regarding the position of these narrative categories in the text. The practical task of the research is to assess the reliability of the annotation that will be used to train algorithms for automatically identifying text worlds. The scientific novelty lies in the fact that we are specifically studying the degree of agreement, whereas in other works, agreement is taken for granted, and if raters disagree with each other, it is perceived as an error by one of the raters or the annotation procedure. In this paper, we present the results of two expert agreement metrics: percent agreement and Krippendorff’s alpha. The obtained results for these metrics show that agreement regarding different elements varies depending on the work and sometimes reaches an average level, sufficient to speak of the reliability of the annotation.
Key words and phrases: нарративные категории, теория текстовых миров, согласованность читателей, разметка художественного текста, метрика согласованности, надежность разметки, narrative categories, Text World Theory, inter-rater agreement, annotation of a literary text, agreement metric, annotation reliability
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
Bakhtin M. M. Voprosy literatury i estetiki. M.: Khud. lit., 1975.
Evseev O. V., Kokh A. N., Mikhal'kova E. V. Sopostavitel'nyi analiz elementov tekstovykh mirov v literaturnoi skazke «Zolushka» Sh. Perro (v perevode na russkii) i odnoimennom kinostsenarii E. Shvartsa // Vestnik Tyumenskogo gosudarstvennogo universiteta. Gumanitarnye issledovaniya. Humanitates. 2023. T. 9. № 1 (33).
Oleinik A. N., Popova I. P., Kirdina S. G., Shatalova T. Yu. Nadezhnost' i dostovernost' v kontent-analize tekstov: vybor pokazatelei // Psikhologicheskii zhurnal. 2014. T. 35. № 6.
Bakeman R., Quera V., McArthur D., Robinson B. F. Detecting sequential patterns and determining their reliability with fallible observers // Psychological Methods. 1997. Vol. 2 (4).
Beck C., Booth H., El-Assady M., Butt M. Representation problems in linguistic annotations: Ambiguity, variation, uncertainty, error and bias // Proceedings of the 14th Linguistic Annotation Workshop. Barcelona, 2020.
Bell A., Ryan M. L. Possible Worlds Theory and Contemporary Narratology. Lincoln: University of Nebraska Press, 2019.
Bird S., Klein E., Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol: O’Reilly Media, Inc., 2009.
Cohen J. A coefficient of agreement for nominal scales // Educational and Psychological Measurement. 1960. Vol. 20 (1).
Cohen J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit // Psychological Bulletin. 1968. Vol. 70 (4).
Detkova J., Novitskiy V., Petrova M., Selegey V. Differential semantic sketches for Russian internet-corpora // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”. 2020. Vol. 19.
Fleiss J. L. Measuring nominal scale agreement among many raters // Psychological Bulletin. 1971. Vol. 76 (5).
Gavins J. Text World Theory: An Introduction. Edinburgh: Edinburgh University Press, 2007.
Gibbons A., Whiteley S. Do worlds have (fourth) walls?: A Text World Theory approach to direct address in Fleabag // Language and Literature. 2020. Vol. 30 (2).
Gwet K. L. Chapter 6 // Gwet K. L. Handbook of Inter-Rater Reliability. Gaithersburg: Advanced Analytics, LLC, 2014.
Hayes A. F., Krippendorff K. Answering the call for a standard reliability measure for coding data // Communication Methods and Measures. 2007. Vol. 1 (1).
Ho Y., Lugea J., McIntyre D., Wang J., Xu Z. Projecting (un)certainty: A text-world analysis of three statements from the Meredith Kercher murder case // English Text Construction. 2018. Vol. 11 (2).
Ho Y.-F., Lugea J., McIntyre D., Xu Z., Wang J. Text-world annotation and visualization for crime narrative reconstruction // Digital Scholarship in the Humanities. 2019. Vol. 34 (2).
Honnibal M., Montani I., Van Landeghem S., Boyd A. spaCy: Industrial-Strength Natural Language Processing in Python // Zenodo. 2020. https://dx.doi.org/10.5281/zenodo.1212303
Hu Y., Mao H., McKenzie G. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements // International Journal of Geographical Information Science. 2019. Vol. 33 (4).
Jean-Yves A., Villaneau J., Lefeuvre A. Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: Experimental studies on emotion, opinion and coreference annotation // Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, 2014.
Krippendorff K. Content Analysis: An Introduction to Its Methodology. Los Angeles, 2018.
Landis J. R., Koch G. G. The measurement of observer agreement for categorical data // Biometrics. 1977. Vol. 33 (1).
Mikhalkova E., Protasov T., Drozdova A., Bashmakova A., Gavin P. Towards annotation of text worlds in a literary work // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”. 2019. Vol. 18.
Mikhalkova E., Protasov T., Gavin P., Bashmakova A., Drozdova A. Modelling narrative elements in a short story: A study on annotation schemes and guidelines // Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, 2020.
Peng S., Sun Z., Loftus S., Plank B. Different tastes of entities: Investigating human label variation in named entity annotations // The Third Workshop on Understanding Implicit and Underspecified Language. Malta, 2024.
Raghunath R. Possible Worlds Theory and Counterfactual Historical Fiction. Cham: Springer Nature, 2020.
Sang Y., Mou X., Li J., Stanton J., Yu M. A survey of machine narrative reading comprehension assessments // 31st International Joint Conference on Artificial Intelligence. Vienna: IJCAI, 2022.
Sirinarang B., Wijitsopon R. A cognitive stylistic approach to mind style in the memoir man’s search for meaning // Journal of Studies in the English Language. 2021. Vol. 16 (1).
Srivatsa S., Srinivasa S. Narrative plot comparison based on a bag-of-actors document model // Proceedings of the 29th ACM Conference on Hypertext and Social Media (HT’18) / Association for Computing Machinery. N. Y., 2018.
Stockwell P. Cognitive Poetics: An Introduction. Abingdon-on-Thames: Routledge, 2020.
Tinsley H. E., Weiss D. J.Interrater reliability and agreement of subjective judgments // Journal of Counseling Psychology. 1975. Vol. 22 (4).
Uma A., Fornaciari T., Dumitrache A., Miller T., Chamberlain J., Plank B., Simpson E., Poesio M. SemEval-2021 Task 12: Learning with Disagreements // Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). 2021. https://doi.org/10.18653/v1/2021.semeval-1.41
Wang J., Ho Y.-F., Xu Z., McIntyre D., Lugea J. The visualisation of cognitive structures in forensic statements // 20th International Conference Information Visualisation (IV). Lisbon: IEEE, 2006. https://doi.org/10.1109/IV.2016.60
Weber-Genzel L., Peng S., De Marneffe M. C., Plank B. VariErr NLI: Separating annotation error from human label variation // arXiv. 2024. https://doi.org/10.48550/arXiv.2403.01931
Werth P. Accommodation and the myth of presupposition: The view from discourse // Lingua. 1993. Vol. 89 (1).
Werth P. Extended metaphor – a text-world account // Language and Literature. 1994. Vol. 3 (2).
Werth P. Text Worlds: Representing Conceptual Space in Discourse. Harlow: Longman, 1999.