<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">ellibs-722</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Автоматическое добавление SEO-метаданных в новостные статьи с использованием QWEN-coder</article-title><trans-title-group xml:lang="en"><trans-title>Automatic Addition of Seo Metadata to News Articles using Qwen-Coder</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Салем</surname><given-names>Хамза</given-names></name><name name-style="western" xml:lang="en"><surname>Salem</surname><given-names>Hamza</given-names></name></name-alternatives><email xlink:type="simple">h.salem@innopolis.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Тощев</surname><given-names>Александр Сергеевич</given-names></name><name name-style="western" xml:lang="en"><surname>Toschev</surname><given-names>Alexander Sergeevich</given-names></name></name-alternatives><email xlink:type="simple">atoschev@kpfu.ru</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Университет Иннополис</institution></aff><aff xml:lang="en"><institution>Innopolis University</institution></aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru"><institution>Казанский (Приволжский) федеральный университет</institution></aff><aff xml:lang="en"><institution>Kazan (Volga region) Federal University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2026</year></pub-date><pub-date pub-type="epub"><day>04</day><month>03</month><year>2026</year></pub-date><volume>29</volume><issue>1</issue><fpage>287</fpage><lpage>303</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Салем Х., Тощев А.С., 2026</copyright-statement><copyright-year>2026</copyright-year><copyright-holder xml:lang="ru">Салем Х., Тощев А.С.</copyright-holder><copyright-holder xml:lang="en">Salem H., Toschev A.S.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/722">https://ellibs.elpub.ru/jour/article/view/722</self-uri><abstract><p>Обобщен ранее разработанный конвейер обогащения новостных статей структурированными метаданными и представлена его обновленная конфигурация, в которой GPT-3 (Generative Pre-trained Transformer 3) – языковая модель от компании OpenAI – заменен на открытую модель Qwen-Coder. Новая версия, как и ранее, использует набор из 400 страниц, отобранных через Google News, и остается совместимой с Google Rich Results Test. Эксперименты показали, что качество, сопоставимое с GPT-3, достижимо при локальном запуске на типовом офисном настольном компьютере (CPU, без GPU). Установлено, что замена, указанная выше, снижает зависимость от платных облачных сервисов и обеспечивает более высокую производительность по сравнению с GPT-версией; дана оценка сходства результатов обогащения для Qwen-Coder относительно базовой реализации на GPT-3. Предложенные инструменты снижают порог внедрения семантической разметки и расширяют ее практическое применение, в том числе в цифровой журналистике.
</p></abstract><trans-abstract xml:lang="en"><p>A previously developed pipeline for enriching news articles with structured data is summarized, and an updated configuration is presented in which GPT-3–OpenAI’s third-generation natural language processing model – is replaced with Qwen-Coder. As before, the updated enrichment pipeline uses a dataset of 400 pages selected from Google News, a free news aggregator by Google, remains compatible with the Google Rich Results Test (Google’s tool for validating eligible structured results), and demonstrates that GPT-3-comparable output quality can be achieved on a low-power desktop PC. We describe how this substitution reduces dependence on paid GPT services and report an evaluation comparing the similarity of outputs produced by Qwen-Coder against the GPT-based baseline. The results also show higher performance of the new algorithm compared with the GPT version. The proposed tools lower the barrier to adopting semantic markup practices and thereby broaden their application in digital journalism. Overall, the findings support Qwen-Coder as a cost-effective alternative to large proprietary models for metadata enrichment tasks.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>семантическая паутина</kwd><kwd>майнинг шаблонов</kwd><kwd>Qwen-Coder</kwd><kwd>новостные веб-страницы</kwd><kwd>читабельность</kwd><kwd>структурированные данные</kwd></kwd-group><kwd-group xml:lang="en"><kwd>semantic web</kwd><kwd>pattern mining</kwd><kwd>Qwen-Coder</kwd><kwd>news web pages</kwd><kwd>readability</kwd><kwd>structured data</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Hui B., Yang J., Cui Z. et al. Qwen2.5-Coder Technical Report // arXiv. 2024. arXiv:2409.12186. URL: https://arxiv.org/abs/2409.12186 (access date: 10.01.2026).</mixed-citation><mixed-citation xml:lang="en">Hui B., Yang J., Cui Z. et al. Qwen2.5-Coder Technical Report // arXiv. 2024. arXiv:2409.12186. URL: https://arxiv.org/abs/2409.12186 (access date: 10.01.2026).</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Wang Q. Normalization and Differentiation in Google News: A Multi-Method Analysis of the World’s Largest News Aggregator: Thesis. Rutgers University, NJ, USA, 2020.</mixed-citation><mixed-citation xml:lang="en">Wang Q. Normalization and Differentiation in Google News: A Multi-Method Analysis of the World’s Largest News Aggregator: Thesis. Rutgers University, NJ, USA, 2020.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Rich Results Test. URL: https://search.google.com/test/rich-results (access date: 08.10.2024).</mixed-citation><mixed-citation xml:lang="en">Rich Results Test. URL: https://search.google.com/test/rich-results (access date: 08.10.2024).</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Bashir F., Warraich N.F. Systematic literature review of Semantic Web for distance learning // Interactive Learning Environments. 2020. Vol. 31. P. 527–543.</mixed-citation><mixed-citation xml:lang="en">Bashir F., Warraich N.F. Systematic literature review of Semantic Web for distance learning // Interactive Learning Environments. 2020. Vol. 31. P. 527–543.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Breit A., Waltersdorfer L., Ekaputra F.J., Sabou M., Ekelhart A., Iana A., Paulheim H., Portisch J., Revenko A., Teije A.T., et al. Combining Machine Learning and Semantic Web: A Systematic Mapping Study // ACM Computing Surveys. 2023. Vol. 55. Art. 313.</mixed-citation><mixed-citation xml:lang="en">Breit A., Waltersdorfer L., Ekaputra F.J., Sabou M., Ekelhart A., Iana A., Paulheim H., Portisch J., Revenko A., Teije A.T., et al. Combining Machine Learning and Semantic Web: A Systematic Mapping Study // ACM Computing Surveys. 2023. Vol. 55. Art. 313.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Yu L. Introduction to the Semantic Web and Semantic Web Services. Boca Raton, FL, USA: Chapman and Hall/CRC, 2007.</mixed-citation><mixed-citation xml:lang="en">Yu L. Introduction to the Semantic Web and Semantic Web Services. Boca Raton, FL, USA: Chapman and Hall/CRC, 2007.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Sporny M., Longley D., Kellogg G., Lanthaler M., Lindström N. JSON-LD 1.1: W3C Recommendation. 2020.</mixed-citation><mixed-citation xml:lang="en">Sporny M., Longley D., Kellogg G., Lanthaler M., Lindström N. JSON-LD 1.1: W3C Recommendation. 2020.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Salem H., Salloum H., Orabi O., Sabbagh K., Mazzara M. Enhancing News Articles: Automatic SEO Linked Data Injection for Semantic Web Integration // Applied Sciences. 2025. Vol. 15. Art. 1262. https://doi.org/10.3390/app15031262.</mixed-citation><mixed-citation xml:lang="en">Salem H., Salloum H., Orabi O., Sabbagh K., Mazzara M. Enhancing News Articles: Automatic SEO Linked Data Injection for Semantic Web Integration // Applied Sciences. 2025. Vol. 15. Art. 1262. https://doi.org/10.3390/app15031262.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">OpenAI. GPT-3 powers the next generation of apps. 2021. URL: https://openai.com/index/gpt-3-apps/ (access date: 16.01.2026)</mixed-citation><mixed-citation xml:lang="en">OpenAI. GPT-3 powers the next generation of apps. 2021. URL: https://openai.com/index/gpt-3-apps/ (access date: 16.01.2026)</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Shadbolt N., Berners-Lee T., Hall W. The Semantic Web Revisited // IEEE Intelligent Systems. 2006. Vol. 21. P. 96–101.</mixed-citation><mixed-citation xml:lang="en">Shadbolt N., Berners-Lee T., Hall W. The Semantic Web Revisited // IEEE Intelligent Systems. 2006. Vol. 21. P. 96–101.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Poturak M., Keco D., Tutnic E. Influence of search engine optimization (SEO) on business performance: Case study of private university in Sarajevo // International Journal of Research in Business and Social Science. 2022. Vol. 11. P. 59–68.</mixed-citation><mixed-citation xml:lang="en">Poturak M., Keco D., Tutnic E. Influence of search engine optimization (SEO) on business performance: Case study of private university in Sarajevo // International Journal of Research in Business and Social Science. 2022. Vol. 11. P. 59–68.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Chandrasekaran B., Josephson J.R., Benjamins V.R. What are ontologies, and why do we need them? // IEEE Intelligent Systems and Applications. 1999. Vol. 14. P. 20–26.</mixed-citation><mixed-citation xml:lang="en">Chandrasekaran B., Josephson J.R., Benjamins V.R. What are ontologies, and why do we need them? // IEEE Intelligent Systems and Applications. 1999. Vol. 14. P. 20–26.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Sporny M., Longley D., Kellogg G., Lanthaler M., Lindström N. JSON-LD 1.0: W3C Recommendation. 2014.</mixed-citation><mixed-citation xml:lang="en">Sporny M., Longley D., Kellogg G., Lanthaler M., Lindström N. JSON-LD 1.0: W3C Recommendation. 2014.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Adida B., Birbeck M., McCarron S., Pemberton S. RDFa in XHTML: Syntax and processing: W3C Recommendation. 2008.</mixed-citation><mixed-citation xml:lang="en">Adida B., Birbeck M., McCarron S., Pemberton S. RDFa in XHTML: Syntax and processing: W3C Recommendation. 2008.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Iqbal M., Khalid M.N., Manzoor A.A., Malik M., Shaikh N.A. Search Engine Optimization (SEO): A Study of important key factors in achieving a better Search Engine Result Page (SERP) Position // Sukkur IBA Journal of Computing and Mathematical Sciences. 2022. Vol. 6. P. 1–15.</mixed-citation><mixed-citation xml:lang="en">Iqbal M., Khalid M.N., Manzoor A.A., Malik M., Shaikh N.A. Search Engine Optimization (SEO): A Study of important key factors in achieving a better Search Engine Result Page (SERP) Position // Sukkur IBA Journal of Computing and Mathematical Sciences. 2022. Vol. 6. P. 1–15.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Alfiana F., Khofifah N., Ramadhan T., Septiani N., Wahyuningsih W., Azizah N.N., Ramadhona N. Apply the Search Engine Optimization (SEO) Method to determine Website Ranking on Search Engines // International Journal of Cyber Services and Management. 2023. Vol. 3. P. 65–73.</mixed-citation><mixed-citation xml:lang="en">Alfiana F., Khofifah N., Ramadhan T., Septiani N., Wahyuningsih W., Azizah N.N., Ramadhona N. Apply the Search Engine Optimization (SEO) Method to determine Website Ranking on Search Engines // International Journal of Cyber Services and Management. 2023. Vol. 3. P. 65–73.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Mbonigaba C., Sujatha S., Kumar A.D., Vasuki M. Leveraging Digital Channels for Customer Engagement and Sales: Evaluating SEO, Content Marketing, and Social Media for Brand Growth // International Journal of Engineering Research and Modern Education. 2024. Vol. 9. P. 32–40.</mixed-citation><mixed-citation xml:lang="en">Mbonigaba C., Sujatha S., Kumar A.D., Vasuki M. Leveraging Digital Channels for Customer Engagement and Sales: Evaluating SEO, Content Marketing, and Social Media for Brand Growth // International Journal of Engineering Research and Modern Education. 2024. Vol. 9. P. 32–40.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Lew O.D., Kammerer Y. Factors influencing viewing behavior on search engine results pages: A review of eye-tracking research // Behavior &amp; Information Technology. 2020. Vol. 40. P. 1485–1515.</mixed-citation><mixed-citation xml:lang="en">Lew O.D., Kammerer Y. Factors influencing viewing behavior on search engine results pages: A review of eye-tracking research // Behavior &amp; Information Technology. 2020. Vol. 40. P. 1485–1515.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Rahman A.F.R., Alam H., Hartono R. Content Extraction from HTML Documents // Proceedings of the 1st International Workshop on Web Document Analysis (WDA2001). Seattle, WA, USA, 8 September 2001.</mixed-citation><mixed-citation xml:lang="en">Rahman A.F.R., Alam H., Hartono R. Content Extraction from HTML Documents // Proceedings of the 1st International Workshop on Web Document Analysis (WDA2001). Seattle, WA, USA, 8 September 2001.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Lima R., Espinasse B., Oliveira H., Pentagrossa L., Freitas F. Information Extraction from the Web: An Ontology-Based Method Using Inductive Logic Programming // Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. Herndon, VA, USA, 4–6 November 2013. P. 951–958.</mixed-citation><mixed-citation xml:lang="en">Lima R., Espinasse B., Oliveira H., Pentagrossa L., Freitas F. Information Extraction from the Web: An Ontology-Based Method Using Inductive Logic Programming // Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. Herndon, VA, USA, 4–6 November 2013. P. 951–958.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Zheng S., Song R., Wen J.-R. Template-Independent News Extraction Based on Visual Consistency // Proceedings of the 22nd National Conference on Artificial Intelligence. Vancouver, BC, Canada, 22–26 July 2007. Washington, DC, USA: AAAI Press, 2007. P. 1507–1512.</mixed-citation><mixed-citation xml:lang="en">Zheng S., Song R., Wen J.-R. Template-Independent News Extraction Based on Visual Consistency // Proceedings of the 22nd National Conference on Artificial Intelligence. Vancouver, BC, Canada, 22–26 July 2007. Washington, DC, USA: AAAI Press, 2007. P. 1507–1512.</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Zhu W., Dai S., Song Y., Lu Z. Extracting news content with visual unit of web pages // Proceedings of the 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Takamatsu, Japan, 1–3 June 2015. P. 1–5.</mixed-citation><mixed-citation xml:lang="en">Zhu W., Dai S., Song Y., Lu Z. Extracting news content with visual unit of web pages // Proceedings of the 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Takamatsu, Japan, 1–3 June 2015. P. 1–5.</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Gupta S., Kaiser G., Neistadt D., Grimm P. DOM-based content extraction of HTML documents // Proceedings of the 12th International Conference on World Wide Web. Budapest, Hungary, 20–24 May 2003. P. 207–214.</mixed-citation><mixed-citation xml:lang="en">Gupta S., Kaiser G., Neistadt D., Grimm P. DOM-based content extraction of HTML documents // Proceedings of the 12th International Conference on World Wide Web. Budapest, Hungary, 20–24 May 2003. P. 207–214.</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Mirzaaghaei M., Mesbah A. DOM-based test adequacy criteria for web applications // Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 21–26 July 2014. P. 71–81.</mixed-citation><mixed-citation xml:lang="en">Mirzaaghaei M., Mesbah A. DOM-based test adequacy criteria for web applications // Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 21–26 July 2014. P. 71–81.</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">Lin J. Divergence Measures Based on the Shannon Entropy // IEEE Transactions on Information Theory. 1991. Vol. 37, No. 1. P. 145–151. https://doi.org/10.1109/18.61115.</mixed-citation><mixed-citation xml:lang="en">Lin J. Divergence Measures Based on the Shannon Entropy // IEEE Transactions on Information Theory. 1991. Vol. 37, No. 1. P. 145–151. https://doi.org/10.1109/18.61115.</mixed-citation></citation-alternatives></ref><ref id="cit26"><label>26</label><citation-alternatives><mixed-citation xml:lang="ru">Corander J., Remes U., Koski T. On the Jensen-Shannon divergence and the variation distance for categorical probability distributions // Kybernetika. 2021. Vol. 57. P. 879–907.</mixed-citation><mixed-citation xml:lang="en">Corander J., Remes U., Koski T. On the Jensen-Shannon divergence and the variation distance for categorical probability distributions // Kybernetika. 2021. Vol. 57. P. 879–907.</mixed-citation></citation-alternatives></ref><ref id="cit27"><label>27</label><citation-alternatives><mixed-citation xml:lang="ru">Nielsen F. Jensen–Shannon divergence and diversity index: Origins and some extensions. Preprint. 2021.</mixed-citation><mixed-citation xml:lang="en">Nielsen F. Jensen–Shannon divergence and diversity index: Origins and some extensions. Preprint. 2021.</mixed-citation></citation-alternatives></ref><ref id="cit28"><label>28</label><citation-alternatives><mixed-citation xml:lang="ru">Menéndez M.L., Pardo J.A., Pardo L., Pardo M.C. The Jensen–Shannon divergence // Journal of the Franklin Institute. 1997. Vol. 334. P. 307–318.</mixed-citation><mixed-citation xml:lang="en">Menéndez M.L., Pardo J.A., Pardo L., Pardo M.C. The Jensen–Shannon divergence // Journal of the Franklin Institute. 1997. Vol. 334. P. 307–318.</mixed-citation></citation-alternatives></ref><ref id="cit29"><label>29</label><citation-alternatives><mixed-citation xml:lang="ru">Qwen Team. Qwen3-Coder: GitHub repository. URL: https://github.com/QwenLM/Qwen3-Coder (access date 11.11.2025).</mixed-citation><mixed-citation xml:lang="en">Qwen Team. Qwen3-Coder: GitHub repository. URL: https://github.com/QwenLM/Qwen3-Coder (access date 11.11.2025).</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
