<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-5-1120-1137</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-612</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Абстрактивная суммаризация новостей внешней торговли на основе нового специализированного корпуса данных</article-title><trans-title-group xml:lang="en"><trans-title>Abstractive Summarization for Trade News Analysis Based on a New Domain-Specific Dataset</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Лютова</surname><given-names>Дарья Андреевна</given-names></name><name name-style="western" xml:lang="en"><surname>Lyutova</surname><given-names>Daria Andreevna</given-names></name></name-alternatives><email xlink:type="simple">lyutovad@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Малых</surname><given-names>Валентин Андреевич</given-names></name><name name-style="western" xml:lang="en"><surname>Malykh</surname><given-names>Valentin Andreevich</given-names></name></name-alternatives><email xlink:type="simple">valentin.malykh@phystech.edu</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Всероссийская академия внешней торговли</institution></aff><aff xml:lang="en"><institution>Russian Foreign Trade Academy</institution></aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru"><institution>Университет ИТМО</institution></aff><aff xml:lang="en"><institution>ITMO University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2025</year></pub-date><volume>28</volume><issue>5</issue><fpage>1120</fpage><lpage>1137</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Лютова Д.А., Малых В.А., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Лютова Д.А., Малых В.А.</copyright-holder><copyright-holder xml:lang="en">Lyutova D.A., Malykh V.A.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/612">https://ellibs.elpub.ru/jour/article/view/612</self-uri><abstract><p>Представлен TradeNewsSum — корпус для абстрактивной генерации аннотаций к новостям внешней торговли, охватывающий русско- и англоязычные публикации из профильных источников. Все рефераты подготовлены вручную по унифицированным правилам. Проведены эксперименты с дообучением трансформерных и seq2seq-моделей и автоматическую оценку по схеме LLM-as-a-judge. Наилучшие результаты показала LLaMA 3.1 в режиме инструкционного промптинга, продемонстрировав высокие значения по метрикам, включая фактологическую полноту.
</p></abstract><trans-abstract xml:lang="en"><p>We present TradeNewsSum—a corpus for abstractive summarization of international trade news—covering Russian- and English-language publications from domain-specific sources. All summaries are manually prepared following unified guidelines. We conducted experiments with fine-tuning transformer and seq2seq models and performed automatic evaluation using the LLM-as-a-judge scheme. LLaMA 3.1 in instruction-prompting mode achieved the best results, showing high scores across metrics, including factual completeness.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>абстрактивное реферирование</kwd><kwd>многоязычный корпус</kwd><kwd>новости внешней торговли</kwd><kwd>санкции</kwd><kwd>торговые режимы</kwd><kwd>TradeNewsSum</kwd><kwd>трансформеры</kwd><kwd>большие языковые модели</kwd><kwd>LLM-as-a-judge</kwd><kwd>NER-оценка сущностей</kwd></kwd-group><kwd-group xml:lang="en"><kwd>abstractive summarization</kwd><kwd>multilingual corpus</kwd><kwd>international trade news</kwd><kwd>sanctions</kwd><kwd>trade regimes</kwd><kwd>TradeNewsSum</kwd><kwd>transformers</kwd><kwd>large language models</kwd><kwd>LLM-as-a-judge</kwd><kwd>NER-based entity evaluation</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Bahdanau D. et al. End-to-end attention-based large vocabulary speech recognition // 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016. P. 4945–4949.</mixed-citation><mixed-citation xml:lang="en">Bahdanau D. et al. End-to-end attention-based large vocabulary speech recognition // 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016. P. 4945–4949.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments // Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005. P. 65–72.</mixed-citation><mixed-citation xml:lang="en">Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments // Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005. P. 65–72.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Fabbri A. R. et al. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model // arXiv preprint arXiv:1906.01749. 2019.</mixed-citation><mixed-citation xml:lang="en">Fabbri A. R. et al. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model // arXiv preprint arXiv:1906.01749. 2019.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Fischer T., Remus S., Biemann C. Measuring faithfulness of abstractive summaries // Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022). 2022. P. 63–73.</mixed-citation><mixed-citation xml:lang="en">Fischer T., Remus S., Biemann C. Measuring faithfulness of abstractive summaries // Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022). 2022. P. 63–73.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Fu J. et al. Gptscore: Evaluate as you desire // arXiv preprint arXiv:2302.04166. 2023.</mixed-citation><mixed-citation xml:lang="en">Fu J. et al. Gptscore: Evaluate as you desire // arXiv preprint arXiv:2302.04166. 2023.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Gavrilov D., Kalaidin P., Malykh V. Self-attentive model for headline generation // Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II 41. Springer International Publishing, 2019. P. 87–93.</mixed-citation><mixed-citation xml:lang="en">Gavrilov D., Kalaidin P., Malykh V. Self-attentive model for headline generation // Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II 41. Springer International Publishing, 2019. P. 87–93.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Goyal T., Li J. J., Durrett G. News summarization and evaluation in the era of gpt-3 // arXiv preprint arXiv:2209.12356. 2022.</mixed-citation><mixed-citation xml:lang="en">Goyal T., Li J. J., Durrett G. News summarization and evaluation in the era of gpt-3 // arXiv preprint arXiv:2209.12356. 2022.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Grusky M., Naaman M., Artzi Y. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies // arXiv preprint arXiv:1804.11283. 2018.</mixed-citation><mixed-citation xml:lang="en">Grusky M., Naaman M., Artzi Y. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies // arXiv preprint arXiv:1804.11283. 2018.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Gusev I. Dataset for automatic summarization of Russian news // Artificial Intelligence and Natural Language: 9th Conference, AINL 2020, Helsinki, Finland, October 7–9, 2020, Proceedings 9. Springer International Publishing, 2020. P. 122–134.</mixed-citation><mixed-citation xml:lang="en">Gusev I. Dataset for automatic summarization of Russian news // Artificial Intelligence and Natural Language: 9th Conference, AINL 2020, Helsinki, Finland, October 7–9, 2020, Proceedings 9. Springer International Publishing, 2020. P. 122–134.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Hasan T. et al. XL-sum: Large-scale multilingual abstractive summarization for 44 languages // arXiv preprint arXiv:2106.13822. 2021.</mixed-citation><mixed-citation xml:lang="en">Hasan T. et al. XL-sum: Large-scale multilingual abstractive summarization for 44 languages // arXiv preprint arXiv:2106.13822. 2021.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Kryściński W. et al. Neural text summarization: A critical evaluation // arXiv preprint arXiv:1908.08960. 2019.</mixed-citation><mixed-citation xml:lang="en">Kryściński W. et al. Neural text summarization: A critical evaluation // arXiv preprint arXiv:1908.08960. 2019.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Lewis M. et al. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension // arXiv preprint arXiv:1910.13461. 2019.</mixed-citation><mixed-citation xml:lang="en">Lewis M. et al. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension // arXiv preprint arXiv:1910.13461. 2019.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Liu Y. et al. G-eval: NLG evaluation using gpt-4 with better human alignment // arXiv preprint arXiv:2303.16634. 2023.</mixed-citation><mixed-citation xml:lang="en">Liu Y. et al. G-eval: NLG evaluation using gpt-4 with better human alignment // arXiv preprint arXiv:2303.16634. 2023.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Narayan S., Cohen S. B., Lapata M. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization // arXiv preprint arXiv:1808.08745. 2018.</mixed-citation><mixed-citation xml:lang="en">Narayan S., Cohen S. B., Lapata M. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization // arXiv preprint arXiv:1808.08745. 2018.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Paulus R., Xiong C., Socher R. A deep reinforced model for abstractive summarization // arXiv preprint arXiv:1705.04304. 2017.</mixed-citation><mixed-citation xml:lang="en">Paulus R., Xiong C., Socher R. A deep reinforced model for abstractive summarization // arXiv preprint arXiv:1705.04304. 2017.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Raffel C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer // Journal of machine learning research. 2020. Vol. 21, No. 140. P. 1–67.</mixed-citation><mixed-citation xml:lang="en">Raffel C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer // Journal of machine learning research. 2020. Vol. 21, No. 140. P. 1–67.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Rush A.M., Chopra S., Weston J. A neural attention model for abstractive sentence summarization // arXiv preprint arXiv:1509.00685. 2015.</mixed-citation><mixed-citation xml:lang="en">Rush A.M., Chopra S., Weston J. A neural attention model for abstractive sentence summarization // arXiv preprint arXiv:1509.00685. 2015.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Sandhaus E. The New York Times Annotated Corpus Overview [Electronic resource]. Philadelphia: Linguistic Data Consortium, 2008. (LDC Catalog No. LDC2008T19). https://gwern.net/doc/ai/dataset/2008-sandhaus.pdf (accessed: 21.05.2025).</mixed-citation><mixed-citation xml:lang="en">Sandhaus E. The New York Times Annotated Corpus Overview [Electronic resource]. Philadelphia: Linguistic Data Consortium, 2008. (LDC Catalog No. LDC2008T19). https://gwern.net/doc/ai/dataset/2008-sandhaus.pdf (accessed: 21.05.2025).</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Scialom T. et al. MLSUM: The multilingual summarization corpus // arXiv preprint arXiv:2004.14900. 2020.</mixed-citation><mixed-citation xml:lang="en">Scialom T. et al. MLSUM: The multilingual summarization corpus // arXiv preprint arXiv:2004.14900. 2020.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">See A., Liu P. J., Manning C.D. A Neural Attention Model for Abstractive Sentence Summarization [Electronic resource]. 2016.</mixed-citation><mixed-citation xml:lang="en">See A., Liu P. J., Manning C.D. A Neural Attention Model for Abstractive Sentence Summarization [Electronic resource]. 2016.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">https://github.com/abisee/cnn-dailymail (accessed 07.04.2025).</mixed-citation><mixed-citation xml:lang="en">https://github.com/abisee/cnn-dailymail (accessed 07.04.2025).</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">See A., Liu P.J., Manning C.D. Get to the point: Summarization with pointer-generator networks // arXiv preprint arXiv:1704.04368. 2017.</mixed-citation><mixed-citation xml:lang="en">See A., Liu P.J., Manning C.D. Get to the point: Summarization with pointer-generator networks // arXiv preprint arXiv:1704.04368. 2017.</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Varab D., Schluter N. MassiveSumm: a very large-scale, very multilingual, news summarisation dataset // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. P. 10150–10161.</mixed-citation><mixed-citation xml:lang="en">Varab D., Schluter N. MassiveSumm: a very large-scale, very multilingual, news summarisation dataset // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. P. 10150–10161.</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Vaswani A. et al. Attention is all you need // Advances in neural information processing systems. 2017. Vol. 30.</mixed-citation><mixed-citation xml:lang="en">Vaswani A. et al. Attention is all you need // Advances in neural information processing systems. 2017. Vol. 30.</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">Xin L., Liutova D., Malykh V. Cross-Language Summarization in Russian and Chinese Using the Reinforcement Learning // International Conference on Analysis of Images, Social Networks and Texts. Cham: Springer Nature Switzerland, 2024. P. 179–192.</mixed-citation><mixed-citation xml:lang="en">Xin L., Liutova D., Malykh V. Cross-Language Summarization in Russian and Chinese Using the Reinforcement Learning // International Conference on Analysis of Images, Social Networks and Texts. Cham: Springer Nature Switzerland, 2024. P. 179–192.</mixed-citation></citation-alternatives></ref><ref id="cit26"><label>26</label><citation-alternatives><mixed-citation xml:lang="ru">Yutkin M. Lenta.Ru News Dataset [Electronic resource]. 2018. Available at: https://github.com/yutkin/Lenta.Ru-News-Dataset (accessed 04.05.2025).</mixed-citation><mixed-citation xml:lang="en">Yutkin M. Lenta.Ru News Dataset [Electronic resource]. 2018. Available at: https://github.com/yutkin/Lenta.Ru-News-Dataset (accessed 04.05.2025).</mixed-citation></citation-alternatives></ref><ref id="cit27"><label>27</label><citation-alternatives><mixed-citation xml:lang="ru">Zhang J. et al. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization // International conference on machine learning. PMLR, 2020. P. 11328–11339.</mixed-citation><mixed-citation xml:lang="en">Zhang J. et al. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization // International conference on machine learning. PMLR, 2020. P. 11328–11339.</mixed-citation></citation-alternatives></ref><ref id="cit28"><label>28</label><citation-alternatives><mixed-citation xml:lang="ru">Zhang T. et al. Bertscore: Evaluating text generation with bert // arXiv preprint arXiv:1904.09675. 2019.</mixed-citation><mixed-citation xml:lang="en">Zhang T. et al. Bertscore: Evaluating text generation with bert // arXiv preprint arXiv:1904.09675. 2019.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
