<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-4-806-821</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-600</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Сравнительный анализ текстов геологических публикаций с использованием больших языковых моделей</article-title><trans-title-group xml:lang="en"><trans-title>Comparative Analysis of Geological Texts using Large Language Models</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Патук</surname><given-names>Михаил Иванович</given-names></name><name name-style="western" xml:lang="en"><surname>Patuk</surname><given-names>Michail Ivanovich</given-names></name></name-alternatives><email xlink:type="simple">patuk@mail.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Наумова</surname><given-names>Вера Викторовна</given-names></name><name name-style="western" xml:lang="en"><surname>Naumova</surname><given-names>Vera Viktorovna</given-names></name></name-alternatives><email xlink:type="simple">naumova_new@mail.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Государственный геологический музей им. В. И. Вернадского РАН</institution></aff><aff xml:lang="en"><institution>Vernadsky State Geological Museum RAS</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2025</year></pub-date><volume>28</volume><issue>4</issue><fpage>806</fpage><lpage>821</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Патук М.И., Наумова В.В., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Патук М.И., Наумова В.В.</copyright-holder><copyright-holder xml:lang="en">Patuk M.I., Naumova V.V.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/600">https://ellibs.elpub.ru/jour/article/view/600</self-uri><abstract><p>Стремительный рост объема публикаций во всех областях геологических наук делает критически важным внедрение методов автоматизированной обработки научных текстов. Одним из наиболее перспективных инструментов для решения этой задачи выступают большие языковые модели на основе нейронных сетей. Огромный прорыв в области искусственного интеллекта за последние годы превратил такие модели в незаменимых помощников для исследователей. 
Наши работы по семантическому поиску публикаций с использованием дополнительно тренированных языковых моделей и нахождения меры близости геологических текстов показали хорошие результаты. Но используемые модели оказались неспособны выполнить глубокий анализ текстов. Сравнительный анализ современных архитектур позволил нам выделить модель DeepSeek R1, относящуюся к классу систем с расширенными возможностями логического вывода. Данный тип моделей демонстрирует принципиально новый уровень качества генерации. На базе выбранной модели разработан веб-сервис, предоставляющий уникальный функционал, осуществляющий сравнительный анализ до 5 научных статей стандартного объема; поддержку мульти язычных источников (ввод текстов на английском, китайском, русском и др. языках); формирование структурированных отчетов на русском языке с выделением ключевых тезисов, противоречий и паттернов. Проведено тестирование предложенного подхода для сравнительного анализа геологических публикаций. Тестирование показало результаты, вызывающие доверие.
</p></abstract><trans-abstract xml:lang="en"><p>The rapid increase in the volume of publications in various fields of geology makes it crucial to introduce methods for automated processing of scientific texts. Large language models based on neural networks represent one of the most promising approaches to solving this challenge. The recent breakthroughs in artificial intelligence have made such models indispensable tools for researchers. Our work on semantic search for publications using additionally trained language models and measuring the similarity between geological texts yielded good results. However, the models we used were unable to perform in-depth text analysis. A comparative analysis of modern architectures identified the DeepSeek R1 model as belonging to a class of systems with advanced logical inference abilities. This type of model represents a fundamentally new level of quality in text generation. Based on the chosen model, we have developed a web service that provides unique functionality for comparative analysis of up to 5 scientific articles. The service supports multilingual sources, allowing users to input text in English, Chinese, Russian, etc. It generates structured reports in Russian, highlighting key theses, contradictions, and patterns. The proposed approach has been tested on geological publications, and the results have been promising.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>искусственный интеллект</kwd><kwd>большие языковые модели</kwd><kwd>обработка естественного языка</kwd><kwd>анализ текстов</kwd><kwd>геология</kwd></kwd-group><kwd-group xml:lang="en"><kwd>artificial intelligence</kwd><kwd>large language models</kwd><kwd>natural language processing</kwd><kwd>texts analysis</kwd><kwd>geology</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Large language model.</mixed-citation><mixed-citation xml:lang="en">Large language model.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">https://en.wikipedia.org/wiki/Large_language_model?ysclid=mg7ip9ev9d289421479 (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">https://en.wikipedia.org/wiki/Large_language_model?ysclid=mg7ip9ev9d289421479 (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Patuk M.I., Naumova V.V. Artificial Intelligence Methods for Scientific Research in Geology // Russian Digital Libraries Journal. 2023. Vol. 26, No. 5. P. 673–696. (In Russ.). https://doi.org/10.26907/1562-5419-2023-26-5-673-696</mixed-citation><mixed-citation xml:lang="en">Patuk M.I., Naumova V.V. Artificial Intelligence Methods for Scientific Research in Geology // Russian Digital Libraries Journal. 2023. Vol. 26, No. 5. P. 673–696. (In Russ.). https://doi.org/10.26907/1562-5419-2023-26-5-673-696</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Patuk M.I., Naumova V.V. Using Semantic Search to Select and Rank Geological Publications // Automatic Documentation and Mathematical Linguistics. 2024. Vol. 58, Suppl. 5. P. S294–S298. https://doi.org/10.3103/S0005105525700372</mixed-citation><mixed-citation xml:lang="en">Patuk M.I., Naumova V.V. Using Semantic Search to Select and Rank Geological Publications // Automatic Documentation and Mathematical Linguistics. 2024. Vol. 58, Suppl. 5. P. S294–S298. https://doi.org/10.3103/S0005105525700372</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Patuk M.I., Naumova V.V., Eryomenko V.S. Digital repository "geologyscience.ru": open access to scientific publications on russian geology // Russian Digital Library Journal. 2020. Vol. 23, No. 6. P. 1324–1338 (in Russian).</mixed-citation><mixed-citation xml:lang="en">Patuk M.I., Naumova V.V., Eryomenko V.S. Digital repository "geologyscience.ru": open access to scientific publications on russian geology // Russian Digital Library Journal. 2020. Vol. 23, No. 6. P. 1324–1338 (in Russian).</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Kilizhekov O.K., Tolstov A.V., Yakhin Sh.M., Zyryanov I.V. Diamond deposit of the Mir kimberlite pipe: main research stages, specific features and results of exploration // Russian Mining Industry. 2025. No. 1. P. 49–56 (In Russ.).</mixed-citation><mixed-citation xml:lang="en">Kilizhekov O.K., Tolstov A.V., Yakhin Sh.M., Zyryanov I.V. Diamond deposit of the Mir kimberlite pipe: main research stages, specific features and results of exploration // Russian Mining Industry. 2025. No. 1. P. 49–56 (In Russ.).</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">https://doi.org/10.30686/1609-9192-2025-1-49-56</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.30686/1609-9192-2025-1-49-56</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Shigley J., Chapman J., Ellison R. Discovery and Mining of the Argyle Diamond Deposit, Australia // Gems and Gemology. 2001. Vol. 37. P. 26–41. https://doi.org/10.5741/GEMS.37.1.26</mixed-citation><mixed-citation xml:lang="en">Shigley J., Chapman J., Ellison R. Discovery and Mining of the Argyle Diamond Deposit, Australia // Gems and Gemology. 2001. Vol. 37. P. 26–41. https://doi.org/10.5741/GEMS.37.1.26</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">ChatGPT.</mixed-citation><mixed-citation xml:lang="en">ChatGPT.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">URL: https://en.wikipedia.org/wiki/ChatGPT?ysclid=mg7j88jx9q883735240 (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">URL: https://en.wikipedia.org/wiki/ChatGPT?ysclid=mg7j88jx9q883735240 (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Picazo-Sanchez P., Ortiz-Martin L. Analysing the impact of ChatGPT in research // Applied Intelligence. 2024. Vol. 54. P. 4172–4188.</mixed-citation><mixed-citation xml:lang="en">Picazo-Sanchez P., Ortiz-Martin L. Analysing the impact of ChatGPT in research // Applied Intelligence. 2024. Vol. 54. P. 4172–4188.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">https://doi.org/10.1007/s10489-024-05298-0</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.1007/s10489-024-05298-0</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Islam I., Islam M.N. Exploring the opportunities and challenges of ChatGPT in academia // Discover Education. 2024. Vol. 3. Article no. 31. https://doi.org/10.1007/s44217-024-00114-w</mixed-citation><mixed-citation xml:lang="en">Islam I., Islam M.N. Exploring the opportunities and challenges of ChatGPT in academia // Discover Education. 2024. Vol. 3. Article no. 31. https://doi.org/10.1007/s44217-024-00114-w</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Faiza Farhat F., Sohail Sh. S., Dag Øivind Madsen D.Ø. How trustworthy is ChatGPT? The case of bibliometric analyses // Cogent Engineering. 2023. Vol. 10. Article no. 2222988. https://doi.org/10.1080/23311916.2023.2222988</mixed-citation><mixed-citation xml:lang="en">Faiza Farhat F., Sohail Sh. S., Dag Øivind Madsen D.Ø. How trustworthy is ChatGPT? The case of bibliometric analyses // Cogent Engineering. 2023. Vol. 10. Article no. 2222988. https://doi.org/10.1080/23311916.2023.2222988</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Zashikhina I.M. Scientific Article Writing: Will ChatGPT Help? Vysshee obrazovanie v Rossii // Higher Education in Russia. 2023. Vol. 32, no. 8. P. 24–47.</mixed-citation><mixed-citation xml:lang="en">Zashikhina I.M. Scientific Article Writing: Will ChatGPT Help? Vysshee obrazovanie v Rossii // Higher Education in Russia. 2023. Vol. 32, no. 8. P. 24–47.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">https://doi.org/10.31992/0869-3617-2023-32-8-9-24-47 (In Russ., abstract in Eng.)</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.31992/0869-3617-2023-32-8-9-24-47 (In Russ., abstract in Eng.)</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Hallucination (artificial intelligence). URL: https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence) (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">Hallucination (artificial intelligence). URL: https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence) (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Salvagno M., Taccone F.S., Gerli A.G. Can artificial intelligence help for scientific writing? // Critical Care. 2023. Vol. 27. Article no. 75.</mixed-citation><mixed-citation xml:lang="en">Salvagno M., Taccone F.S., Gerli A.G. Can artificial intelligence help for scientific writing? // Critical Care. 2023. Vol. 27. Article no. 75.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">https://doi.org/10.1186/s13054-023-04380-2</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.1186/s13054-023-04380-2</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Ghorbanfekr H., Kerstens P.J., Dirix K. Classification of geological borehole descriptions using a domain adapted large language model // Applied Computing and Geosciences. 2025. Vol. 25. Article no. 100229.</mixed-citation><mixed-citation xml:lang="en">Ghorbanfekr H., Kerstens P.J., Dirix K. Classification of geological borehole descriptions using a domain adapted large language model // Applied Computing and Geosciences. 2025. Vol. 25. Article no. 100229.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">LLM Leaderboard.</mixed-citation><mixed-citation xml:lang="en">LLM Leaderboard.</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">https://artificialanalysis.ai/leaderboards/models (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">https://artificialanalysis.ai/leaderboards/models (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">T-lite. https://huggingface.co/t-tech/T-lite-it-1.0-Q8_0-GGUF (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">T-lite. https://huggingface.co/t-tech/T-lite-it-1.0-Q8_0-GGUF (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">GigaChat. https://giga.chat/ (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">GigaChat. https://giga.chat/ (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">DeepSeek. https://www.deepseek.com/en (date of access 01.10.2025)</mixed-citation><mixed-citation xml:lang="en">DeepSeek. https://www.deepseek.com/en (date of access 01.10.2025)</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
