<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-5-1267-1278</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-619</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Стилометрический анализ в задаче поиска заимствований  текстов на татарском языке</article-title><trans-title-group xml:lang="en"><trans-title>Stylometric Analysis in the Task of Searching for Borrowings of Texts in the Tatar Language</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Хаялеева</surname><given-names>Изида Зуфаровна</given-names></name><name name-style="western" xml:lang="en"><surname>Khayaleeva</surname><given-names>Izida Zufarovna</given-names></name></name-alternatives><email xlink:type="simple">izidakh@yandex.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Абрамский</surname><given-names>Михаил Михайлович</given-names></name><name name-style="western" xml:lang="en"><surname>Abramskiy</surname><given-names>Mikhail Mikhailovich</given-names></name></name-alternatives><email xlink:type="simple">mabramsk@kpfu.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Казанский (Приволжский) федеральный университет</institution></aff><aff xml:lang="en"><institution>Kazan (Volga region) Federal University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2025</year></pub-date><volume>28</volume><issue>5</issue><fpage>1267</fpage><lpage>1278</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Хаялеева И.З., Абрамский М.М., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Хаялеева И.З., Абрамский М.М.</copyright-holder><copyright-holder xml:lang="en">Khayaleeva I.Z., Abramskiy M.M.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/619">https://ellibs.elpub.ru/jour/article/view/619</self-uri><abstract><p>Рассмотрена возможность применения методов стилометрического анализа для поиска заимствований в текстах на татарском языке. Разработаны соответствующие инструменты, в которых использованы алгоритмы машинного обучения, включая кластеризацию (метод k-средних), классификацию (метод случайного леса, метод опорных векторов, наивный байесовский классификатор) и гибридный подход (модель FastText + логистическая регрессия). Особое внимание уделено адаптации лингвистических метрик для татарского языка.
</p></abstract><trans-abstract xml:lang="en"><p>This article discusses the use of stylometric analysis in searching for borrowings of text in the Tatar language. Relevant tools have been developed, utilizing machine learn-ing algorithms, including clustering (k-means method), classification (random forest method, support vector machine method, naive Bayes classifier), and a hybrid approach (FastText model + logistic regression). Special attention is paid to the adaptation of lin-guistic metrics for the Tatar language.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>поиск заимствований</kwd><kwd>обработка естественного языка</kwd><kwd>стилометрический анализ</kwd><kwd>татарский язык</kwd></kwd-group><kwd-group xml:lang="en"><kwd>plagiarism detection</kwd><kwd>natural language processing</kwd><kwd>stylometric analysis</kwd><kwd>Tatar language</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Postanovlenie Kabineta Ministrov Respubliki Tatarstan "Ob Utverzhdenii Gosudarstvennoy Programmy Sokhranenie, Izucheniye i Razvitie Gosudarstvennykh Yazykov Respubliki Tatarstan i Drugikh Yazykov v Respublike Tatarstan na 2023–2030 Gody" // Official Portal of Juridical Information of Republic of Tatarstan. Kazan, 2020. URL: https://pravo.tatarstan.ru/npa_kabmin/post/?npa_id=625356 (access date: 19.08.2025).</mixed-citation><mixed-citation xml:lang="en">Postanovlenie Kabineta Ministrov Respubliki Tatarstan "Ob Utverzhdenii Gosudarstvennoy Programmy Sokhranenie, Izucheniye i Razvitie Gosudarstvennykh Yazykov Respubliki Tatarstan i Drugikh Yazykov v Respublike Tatarstan na 2023–2030 Gody" // Official Portal of Juridical Information of Republic of Tatarstan. Kazan, 2020. URL: https://pravo.tatarstan.ru/npa_kabmin/post/?npa_id=625356 (access date: 19.08.2025).</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Karimov K.Kh., Vasily E.A. Teoreticheskie osnovy klasterizacii dannyh // Aktual'nye voprosy fundamental'nyh i prikladnyh nauchnyh issledovanij. 2023. P. 242–247.</mixed-citation><mixed-citation xml:lang="en">Karimov K.Kh., Vasily E.A. Teoreticheskie osnovy klasterizacii dannyh // Aktual'nye voprosy fundamental'nyh i prikladnyh nauchnyh issledovanij. 2023. P. 242–247.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Balyasova I.I. Parametry Slozhnosti Teksta v Tatarskom Yazyke // Vyzovy i Trendy Mirovoy Lingvistiki. 2020. Vol. 16. P. 302.</mixed-citation><mixed-citation xml:lang="en">Balyasova I.I. Parametry Slozhnosti Teksta v Tatarskom Yazyke // Vyzovy i Trendy Mirovoy Lingvistiki. 2020. Vol. 16. P. 302.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Solnyshkina M.I. McNamara D.S., Zamaletdinov R.R. Obrabotka Yestestvennogo Yazyka i Izucheniye Slozhnosti Diskursa // Russian Journal of Linguistics. 2022. Vol. 26, No. 2. P. 317–341.</mixed-citation><mixed-citation xml:lang="en">Solnyshkina M.I. McNamara D.S., Zamaletdinov R.R. Obrabotka Yestestvennogo Yazyka i Izucheniye Slozhnosti Diskursa // Russian Journal of Linguistics. 2022. Vol. 26, No. 2. P. 317–341.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Scott M., Tribble C. Textual patterns: Key words and corpus analysis in language education. Аmsterdam: John Benjamins Publishing, 2006. 203 с.</mixed-citation><mixed-citation xml:lang="en">Scott M., Tribble C. Textual patterns: Key words and corpus analysis in language education. Аmsterdam: John Benjamins Publishing, 2006. 203 с.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Honoré A. et al. Some simple measures of richness of vocabulary // Association for literary and linguistic computing bulletin. 1979. Vol. 7, No. 2. P. 172–177.</mixed-citation><mixed-citation xml:lang="en">Honoré A. et al. Some simple measures of richness of vocabulary // Association for literary and linguistic computing bulletin. 1979. Vol. 7, No. 2. P. 172–177.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Flesch R. A new readability yardstick // Journal of applied psychology. 1948. Vol. 32, No. 3. P. 221–233.</mixed-citation><mixed-citation xml:lang="en">Flesch R. A new readability yardstick // Journal of applied psychology. 1948. Vol. 32, No. 3. P. 221–233.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Kincaid J.P., Fishburne Jr R.P., Rogers R.L., Chissom B.S. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel // Institute for Simulation and Training. 1975. 49 p.</mixed-citation><mixed-citation xml:lang="en">Kincaid J.P., Fishburne Jr R.P., Rogers R.L., Chissom B.S. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel // Institute for Simulation and Training. 1975. 49 p.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Kuzman T., Ljubešić N. Automatic genre identification: a survey // Language Resources and Evaluation. 2025. Vol. 59, No. 1. P. 537–570.</mixed-citation><mixed-citation xml:lang="en">Kuzman T., Ljubešić N. Automatic genre identification: a survey // Language Resources and Evaluation. 2025. Vol. 59, No. 1. P. 537–570.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Salman H.A., Kalakech A., Steiti A. Random forest algorithm overview // Babylonian Journal of Machine Learning. 2024. Vol. 2024. P. 69–79.</mixed-citation><mixed-citation xml:lang="en">Salman H.A., Kalakech A., Steiti A. Random forest algorithm overview // Babylonian Journal of Machine Learning. 2024. Vol. 2024. P. 69–79.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Bansal M., Goyal A., Choudhary A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short-term memory algorithms in machine learning // Decision Analytics Journal. 2022. Vol. 3. P. 100071.</mixed-citation><mixed-citation xml:lang="en">Bansal M., Goyal A., Choudhary A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short-term memory algorithms in machine learning // Decision Analytics Journal. 2022. Vol. 3. P. 100071.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Rastogi S., Sambyal R., Tyagi P., Kushwaha R. Multinomial Naive Bayes Classification Algorithm Based Robust Spam Detection System // 2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.0. IEEE, 2024. P. 1–5.</mixed-citation><mixed-citation xml:lang="en">Rastogi S., Sambyal R., Tyagi P., Kushwaha R. Multinomial Naive Bayes Classification Algorithm Based Robust Spam Detection System // 2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.0. IEEE, 2024. P. 1–5.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Khusainova A., Khan A., Rivera A.R. Sart-similarity, analogies, and relatedness for tatar language: New benchmark datasets for word embeddings evaluation // International Conference on Computational Linguistics and Intelligent Text Processing. Cham: Springer Nature Switzerland, 2019. P. 380–390.</mixed-citation><mixed-citation xml:lang="en">Khusainova A., Khan A., Rivera A.R. Sart-similarity, analogies, and relatedness for tatar language: New benchmark datasets for word embeddings evaluation // International Conference on Computational Linguistics and Intelligent Text Processing. Cham: Springer Nature Switzerland, 2019. P. 380–390.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Pedregosa F. et al. Scikit-learn: Machine Learning in Python // Journal of Machine Learning Research. 2020. Vol. 12. P. 2825–2830.</mixed-citation><mixed-citation xml:lang="en">Pedregosa F. et al. Scikit-learn: Machine Learning in Python // Journal of Machine Learning Research. 2020. Vol. 12. P. 2825–2830.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Conneau A., Khandelwal K., Goyal N., Chaudhary V., Wenzek G., Guzmán F., Stoyanov V. Unsupervised Cross-lingual Representation Learning at Scale // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. P. 8440–8451.</mixed-citation><mixed-citation xml:lang="en">Conneau A., Khandelwal K., Goyal N., Chaudhary V., Wenzek G., Guzmán F., Stoyanov V. Unsupervised Cross-lingual Representation Learning at Scale // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. P. 8440–8451.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
