<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-6-1385-1414</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-625</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Пост-коррекция слабой расшифровки большими языковыми моделями в итерационном процессе распознавания рукописей</article-title><trans-title-group xml:lang="en"><trans-title>Post-Correction of Weak Transcriptions by Large Language Models in the Iterative Process of Handwritten Text Recognition</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Зыков</surname><given-names>Валерий Павлович</given-names></name><name name-style="western" xml:lang="en"><surname>Zykov</surname><given-names>Valerii Pavlovich</given-names></name></name-alternatives><email xlink:type="simple">zykovvp@my.msu.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Местецкий</surname><given-names>Леонид Моисеевич</given-names></name><name name-style="western" xml:lang="en"><surname>Mestetskiy</surname><given-names>Leonid Moiseevich</given-names></name></name-alternatives><email xlink:type="simple">mestlm@mail.ru</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Московский государственный университет имени М. В. Ломоносова</institution></aff><aff xml:lang="en"><institution>Lomonosov Moscow State University</institution></aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru"><institution>НИУ Высшая школа экономики</institution></aff><aff xml:lang="en"><institution>Higher School of Economics</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2025</year></pub-date><volume>28</volume><issue>6</issue><fpage>1385</fpage><lpage>1414</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Зыков В.П., Местецкий Л.М., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Зыков В.П., Местецкий Л.М.</copyright-holder><copyright-holder xml:lang="en">Zykov V.P., Mestetskiy L.M.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/625">https://ellibs.elpub.ru/jour/article/view/625</self-uri><abstract><p>Рассмотрена задача ускорения построения точной редакторской разметки рукописных архивных текстов в рамках инкрементного цикла обучения на основе слабой расшифровки. В отличие от ранее опубликованных результатов, основное внимание уделено интеграции автоматической посткоррекции слабой расшифровки с помощью больших языковых моделей (Large Language Models, LLM). Предложен и реализован протокол применения LLM на уровне строк в режиме обучения на нескольких примерах с тщательно сконструированными промптами и контролем формата вывода (сохранение дореформенной орфографии, защита имен и числительных, запрет на изменение структуры строк). Эксперименты проведены на корпусе дневников А. В. Сухово-Кобылина. В качестве базовой модели распознавания использована строчная версия модели Vertical Attention Network. Результаты показали, что LLM-коррекция на примере сервиса ChatGPT-4o заметно улучшает читабельность слабой разметки и существенно снижает процент ошибок в словах (в нашем опыте – порядка −12 процентных пунктов), при этом не внося ухудшения в проценте ошибок в буквах. Другой исследуемый сервис – DeepSeek-R1 – показал менее стабильное поведение. Рассмотрены практические настройки промптов, ограничения (контекстные лимиты, риск «галлюцинаций») и даны рекомендации по безопасной интеграции LLM-коррекции в итерационный пайплайн разметки с целью сокращения трудозатрат эксперта-асессора и ускорения оцифровки исторических архивов.
</p></abstract><trans-abstract xml:lang="en"><p>This paper addresses the problem of accelerating the construction of accurate editorial annotations for handwritten archival texts within an incremental training cycle based on weak transcription. Unlike our previously published results, the present work focuses on integrating automatic post-correction of weak transcriptions using large language models (LLMs). We propose and implement a protocol for applying LLMs at the line level in a few-shot setup with carefully designed prompts and strict output format control (preservation of pre-reform orthography, protection of proper names and numerals, prohibition of structural changes to lines). Experiments are conducted on the corpus of diaries by A.V. Sukhovo-Kobylin. As the base recognition model, we use the line-level variant of the Vertical Attention Network (VAN). Results show that LLM post-correction–exemplified by the ChatGPT-4o service–substantially improves the readability of weak transcriptions and significantly reduces the word error rate (in our experiments by about −12 percentage points), without degrading the character error rate. Another service tested, DeepSeek-R1, demonstrated less stable behavior. We discuss practical prompt engineering, limitations (context length limits, risk of “hallucinations”), and provide recommendations for the safe integration of LLM post-correction into an iterative annotation pipeline to reduce expert annotators’ workload and speed up the digitization of historical archives.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>распознавание рукописного текста</kwd><kwd>слабая разметка</kwd><kwd>Vertical Attention Network (VAN)</kwd><kwd>большие языковые модели (LLM)</kwd><kwd>посткоррекция</kwd><kwd>итерационное дообучение</kwd></kwd-group><kwd-group xml:lang="en"><kwd>handwritten text recognition</kwd><kwd>weak markup</kwd><kwd>Vertical Attention Network (VAN)</kwd><kwd>large language models (LLM)</kwd><kwd>post-correction</kwd><kwd>iterative retraining</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Penskaya E.N., Kuptsova O.N. (2024) The Invisible Quantity. A.V. Sukhovo-Kobylin: Theater, Literature, Life. Moscow: HSE Publishing House, 2024. 472 p. (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Penskaya E.N., Kuptsova O.N. (2024) The Invisible Quantity. A.V. Sukhovo-Kobylin: Theater, Literature, Life. Moscow: HSE Publishing House, 2024. 472 p. (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Mestetsky L.M., Smirnova V.S. Line segmentation in images of handwritten documents // Proceedings of the International Conference on Computer Graphics and Vision (Grafikon-2025). Yoshkar-Ola: Volga State Technological University, 2025. (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Mestetsky L.M., Smirnova V.S. Line segmentation in images of handwritten documents // Proceedings of the International Conference on Computer Graphics and Vision (Grafikon-2025). Yoshkar-Ola: Volga State Technological University, 2025. (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Mestetskiy L.M., Zykov V.P. Incremental markup of 19th-century handwritten ar-chival diaries // Software &amp; Systems. 2025. Vol. 38, No. 4. https://doi.org/10.15827/0236-235X.152. (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Mestetskiy L.M., Zykov V.P. Incremental markup of 19th-century handwritten ar-chival diaries // Software &amp; Systems. 2025. Vol. 38, No. 4. https://doi.org/10.15827/0236-235X.152. (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Coquenet D., Chatelain C., Paquet T. End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023. Vol. 45, No. 1. P. 508–524. https://doi.org/10.1109/TPAMI.2022.3144899</mixed-citation><mixed-citation xml:lang="en">Coquenet D., Chatelain C., Paquet T. End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023. Vol. 45, No. 1. P. 508–524. https://doi.org/10.1109/TPAMI.2022.3144899</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Boltunova E.M., Laptev A.K. Handwriting recognition and data mining: Possibilities of neural network technologies (based on admiral Fyodor Lutke's diary) // Imagology and Comparative Studies. 2025. No. 23. P. 358–379. https://doi.org/10.17223/24099554/23/17. (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Boltunova E.M., Laptev A.K. Handwriting recognition and data mining: Possibilities of neural network technologies (based on admiral Fyodor Lutke's diary) // Imagology and Comparative Studies. 2025. No. 23. P. 358–379. https://doi.org/10.17223/24099554/23/17. (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Brown T.B., Mann B., Ryder N., Subbiah M. et al. Language Models are Few-Shot Learners // Advances in Neural Information Processing Systems (NeurIPS). 2020. Vol. 33. P. 1877–1901.</mixed-citation><mixed-citation xml:lang="en">Brown T.B., Mann B., Ryder N., Subbiah M. et al. Language Models are Few-Shot Learners // Advances in Neural Information Processing Systems (NeurIPS). 2020. Vol. 33. P. 1877–1901.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Marti U.-V., Bunke H. The IAM-database: an English sentence database for offline handwriting recognition // International Journal on Document Analysis and Recognition (IJDAR). 2002. Vol. 5, No. 1. P. 39–46. https://doi.org/10.1007/s100320200071</mixed-citation><mixed-citation xml:lang="en">Marti U.-V., Bunke H. The IAM-database: an English sentence database for offline handwriting recognition // International Journal on Document Analysis and Recognition (IJDAR). 2002. Vol. 5, No. 1. P. 39–46. https://doi.org/10.1007/s100320200071</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Sánchez J., Romero V., Toselli A. H., Vidal E. ICFHR2016 competition on handwritten text recognition on the READ dataset // Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR 2016). 2016. P. 630–635.</mixed-citation><mixed-citation xml:lang="en">Sánchez J., Romero V., Toselli A. H., Vidal E. ICFHR2016 competition on handwritten text recognition on the READ dataset // Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR 2016). 2016. P. 630–635.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Shi B., Bai X., Yao C. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017. Vol. 39, No. 11. P. 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371</mixed-citation><mixed-citation xml:lang="en">Shi B., Bai X., Yao C. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2017. Vol. 39, No. 11. P. 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Graves A., Fernández S., Gomez F., Schmidhuber J. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks // Proceedings of the 23rd International Conference on Machine Learning (ICML 2006). 2006. P. 369–376. https://doi.org/10.1145/1143844.1143891</mixed-citation><mixed-citation xml:lang="en">Graves A., Fernández S., Gomez F., Schmidhuber J. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks // Proceedings of the 23rd International Conference on Machine Learning (ICML 2006). 2006. P. 369–376. https://doi.org/10.1145/1143844.1143891</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Coquenet D., Chatelain C., Paquet T. SPAN: A Simple Predict &amp; Align Network for Handwritten Paragraph Recognition // Document Analysis and Recognition – ICDAR 2021. Lecture Notes in Computer Science, Vol. 12823. Springer, 2021. P. 70–84. https://doi.org/10.1007/978-3-030-86334-0_5</mixed-citation><mixed-citation xml:lang="en">Coquenet D., Chatelain C., Paquet T. SPAN: A Simple Predict &amp; Align Network for Handwritten Paragraph Recognition // Document Analysis and Recognition – ICDAR 2021. Lecture Notes in Computer Science, Vol. 12823. Springer, 2021. P. 70–84. https://doi.org/10.1007/978-3-030-86334-0_5</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Yousef M., Bishop T.E. OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by Learning to Unfold // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020). 2020. P. 14710–14719. https://doi.org/10.1109/CVPR42600.2020.01472</mixed-citation><mixed-citation xml:lang="en">Yousef M., Bishop T.E. OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by Learning to Unfold // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020). 2020. P. 14710–14719. https://doi.org/10.1109/CVPR42600.2020.01472</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Li M., Lv T., Chen J., Cui L., Lu Y., Florencio D., Zhang C., Li Z., Wei F. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models // Proceedings of the AAAI Conference on Artificial Intelligence. 2023. Vol. 37, No. 12. P. 14216–14224.</mixed-citation><mixed-citation xml:lang="en">Li M., Lv T., Chen J., Cui L., Lu Y., Florencio D., Zhang C., Li Z., Wei F. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models // Proceedings of the AAAI Conference on Artificial Intelligence. 2023. Vol. 37, No. 12. P. 14216–14224.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Potanin M., Dimitrov D., Shonenkov A., Bataev V., Karachev D., Novopoltsev M., Chertok A. Digital Peter: New Dataset, Competition and Handwriting Recognition Methods // Proceedings of the 6th International Workshop on Historical Document Imaging and Processing. ACM, 2021. P. 43–48. https://doi.org/10.1145/3476887.3476892</mixed-citation><mixed-citation xml:lang="en">Potanin M., Dimitrov D., Shonenkov A., Bataev V., Karachev D., Novopoltsev M., Chertok A. Digital Peter: New Dataset, Competition and Handwriting Recognition Methods // Proceedings of the 6th International Workshop on Historical Document Imaging and Processing. ACM, 2021. P. 43–48. https://doi.org/10.1145/3476887.3476892</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Lakshminarayanan B., Pritzel A., Blundell C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles // Advances in Neural Information Processing Systems (NeurIPS). 2017. Vol. 30. P. 6402–6413.</mixed-citation><mixed-citation xml:lang="en">Lakshminarayanan B., Pritzel A., Blundell C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles // Advances in Neural Information Processing Systems (NeurIPS). 2017. Vol. 30. P. 6402–6413.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
