Studi Slavistici XXII (2025) Special Issue
Special issue

The Digital Revolution in Slavic Manuscript Studies: HTR Technology and its Impact on Philological Research

Achim Rabus
University of Freiburg
Bio
Martin Meindl
University of Freiburg
Bio

Published 2025-12-12

Keywords

  • Handwritten Text Recognitionition,
  • mass digitization,
  • Artificial Intelligence in philology,
  • Church Slavonic,
  • corpus linguistics,
  • multiple use of data
  • ...More
    Less

Abstract

The paper highlights the recent advances of computer-assisted manuscript transcription using Handwritten Text Recognition (HTR) programs such as Transkribus. Numerous examples showing the capabilities of current HTR models with respect to different Slavic scripts and handwriting styles are presented and ways to use automatically transcribed sources for multiple purposes are discussed. We demonstrate that the transcription quality is stable throughout an entire document and that researchers can gauge the quality of their HTR transcription based on a limited number of pages. Afterwards, a calculation is conducted showing that the use of HTR as an instrument of pre-transcribing manuscripts and printings makes the overall process of transcribing significantly cheaper, thus making projects possible that could not be conducted without HTR technology due to financial reasons. The paper is concluded with an appeal to share training data and make ample use of these new advancements in HTR.

References

  1. Burlacu, Rabus 2021: C. Burlacu, A. Rabus, Digitising (Romanian) Cyrillic using Transkribus: New Perspectives, “Diacronica”, xiv, 2021, pp 1-9.
  2. Cleminson 2008: R. Cleminson, xslt and the Analysis of Critical Variants, in: H. Miklas, A. Miltenova (eds.), Slovo: Towards a Digital Library of South Slavic Manuscripts. Proceedings of the International Conference, 21-26 February 2008, Sofia, 2008, pp. 227-233.
  3. Eder 2013: M. Eder, Mind Your Corpus: Systematic Errors in Authorship Attribution, “Literary and Linguistic Computing”, xxviii, 2013, 4, pp. 603-614.
  4. Everitt, Skrondal 2010: B.S. Everitt, A. Skrondal, The Cambridge Dictionary of Statistics, New York 2010.
  5. Franzini et al. 2018: G. Franzini, M. Kestemont, G. Rotari, M. Jander, J.K. Ochab, E. Franzini, J. Byszuk, J. Rybicki, Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm, “Frontiers in Digital Humanities”, v, 2018, 4, pp. 1-15.
  6. Hintze, Nelson 1998: J.L. Hintze, R.D. Nelson, Violin Plots: A Box Plot-Density Trace Synergism, “The American Statistician”, lii, 1998, 2, pp. 181-184.
  7. Levenštejn 1965: V.I. Levenštejn, Dvoičnye kody s ispravleniem vypadenij i vstavok simvola, “Problemy peredači informacii”, i, 1965, 1, pp. 12-25.
  8. Levshina 2019: N. Levshina, Token-based Typology and Word Order Entropy: A Study Based on Universal Dependencies, “Linguistic Typology”, xxiii, 2019, 3, pp. 533-572.
  9. Muehlberger et al. 2019: G. Muehlberger, L. Seaward, M. Terras, S.A. Oliveira, V. Bosch, M. Bryan, S. Colutto, H. Déjean, M. Diem, S. Fiel, B. Gatos, A. Greinoecker, T. Grüning, G. Hackl, V. Haukkovaara, G. Heyer, L. Hirvonen, T. Hodel, M. Jokinen, P. Kahle, M. Kallio, F. Kaplan, F. Kleber, R. Labahn, E. M. Lang, S. Laube, G. Leifert, G. Louloudis, R. McNicholl, J. Meunier, J. Michael, E. Mühlbauer, N. Philipp, I. Pratikakis, J. Puigcerver Pérez, H. Putz, G. Retsinas, V. Romero, R. Sablatnig, J.A. Sánchez, P. Schofield, G. Sfikas, C. Sieber, N. Stamatopoulos, T. Strauß, T. Terbul, A.H. Toselli, B. Ulreich, M. Villegas, E. Vidal, J. Walcher, M. Weidemann, H. Wurster, K. Zagoris, Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study, “Journal of documentation”, lxxv, 2019, 5, pp. 954-976.
  10. Petrov, Rabus 2023: I.N. Petrov, A. Rabus, Linguistic Analysis of Church Slavonic Documents: A Mixed-Methods Approach, “Scando-Slavica”, lxix, 2023, 1, pp. 25-38, doi: 10.1080/00806765.2023.2189617.
  11. Piper 2020: A. Piper. Can we be wrong? The Problem of Textual Evidence in a Time of Data. Cambridge 2020.
  12. Polomac 2022: V. Polomac, Serbian Early Printed Books from Venice: Creating Models for Automatic Text Recognition Using Transkribus, “Scripta & e-Scripta”, xxii, 2022, pp. 11-29.
  13. Polomac et al. 2023: V. Polomac, M. Kurešević, I. Bjelaković, A. Colić Jovanović Digitizing Cyrillic Manuscripts for the Historical Dictionary of the Serbian Language Using Handwritten Text Recognition, “Slověne”, xii, 2023, 1, pp. 295-316.
  14. Polomac, Rabus 2025: V. Polomac, A. Rabus, Serbian Early Printed Books from Venice: A Quantitative Approach to Orthographic Variations, “Studi Slavistici”, xxi, 2025, 2, pp. 37-60.
  15. Rabus 2019: A. Rabus, Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach using Transkribus, “Scripta & e-Scripta”, xix, 2019, pp. 9-32.
  16. Rabus 2022: A. Rabus, Handwritten text recognition for Croatian glagolitic, “Slovo: časopis Staroslavenskoga instituta u Zagrebu”, lxxii, 2022, 1, pp. 181-192.
  17. Rabus 2023: A. Rabus, Training generic models for Handwritten Text Recognition using Transkribus: Opportunities and Pitfalls, in: S.A. Pink, A.J. Lappin (eds.), Dark Archives, i: Voyages into the Medieval Unread and Unreadable, 2019-2021, Oxford 2023, pp. 183-208.
  18. Rabus 2024: A. Rabus, Tolerating Imperfection: Uncorrected Transkribus Transcriptions in Church Slavonic Studies, in: L. Taseva, A. Rabus, I.P. Petrov (ur.), Učitelnoto evangelie na Konstantin Preslavski i južnoslavjanskite prevodi na chomiletični tekstove (ix-xiii v.). Dokladi ot Meždunarodnata naužna konferencija v Sofija 25-27 april 2023 g. // Constantine of Preslav’s Uchitel’noe Evangelie and the South Slavonic Homiletic Texts (9th-13th Century). Proceedings of the International Scientific Conference in Sofia, April 25-27, 2023, Sofija 2024, pp. 452-467.
  19. Rabus, Thompson 2023: A. Rabus, W. Thompson, Performance of Generic htr Models on Historical Cyrillic and Glagolitic: Comparison of Engines, “Scripta & e-Scripta”, xxiii, 2023, pp. 11-34.
  20. Thompson 2021: W. Thompson, Using Handwritten Text Recognition (htr) Tools to Transcribe Historical Multilingual Lexica, “Scripta & e-Scripta”, xxi, 2021, pp. 217-231.
  21. Thompson 2024: W. Thompson, Epifanii Slavinetskii’s Greek-Slavonic-Latin Lexicon between East and West, Heidelberg 2024 (= Empirie und Theorie der Sprachwissenschaft, 8).
  22. Tichova 2012: M. Tichova, Starobălgarskoto učitelno evangelie na Konstantin Preslavski, Freiburg i. Br. 2012.
  23. Waldenfels, Rabus 2015: R. von Waldenfels, A. Rabus, Recycling the Metropolitan: Building an Electronic Corpus on the Basis of the Edition of the Velikie Minei Čet’i, “Scripta & e-Scripta”, xiv-xv, 2015, pp. 27-38.