<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-3-654-681</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-580</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Методика сравнения программных решений распознавания текстов научных публикаций по качеству извлечения метаданных</article-title><trans-title-group xml:lang="en"><trans-title>Procedure for Comparing Text Recognition Software Solutions For Scientific Publications by the Quality of Metadata Extraction</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Кузнецов</surname><given-names>Илия Игоревич</given-names></name><name name-style="western" xml:lang="en"><surname>Kuznetsov</surname><given-names>Ilia Igorevich</given-names></name></name-alternatives><email xlink:type="simple">iliya-kuznetsov@mail.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Новиков</surname><given-names>Олег Пантелеевич</given-names></name><name name-style="western" xml:lang="en"><surname>Novikov</surname><given-names>Oleg Panteleevich</given-names></name></name-alternatives><email xlink:type="simple">novikovop55@rambler.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Ильин</surname><given-names>Дмитрий Юрьевич</given-names></name><name name-style="western" xml:lang="en"><surname>Ilin</surname><given-names>Dmitry Yurievich</given-names></name></name-alternatives><email xlink:type="simple">i@dmitryilin.com</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Российский государственный университет им. А.Н. Косыгина (Технологии. Дизайн. Искусство)</institution></aff><aff xml:lang="en"><institution>A. N. Kosygin Moscow State Textile University</institution></aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru"><institution>МИРЭА – Российский технологический университет</institution></aff><aff xml:lang="en"><institution>MIREA – Russian Technological University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>23</day><month>06</month><year>2025</year></pub-date><volume>28</volume><issue>3</issue><fpage>654</fpage><lpage>680</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Кузнецов И.И., Новиков О.П., Ильин Д.Ю., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Кузнецов И.И., Новиков О.П., Ильин Д.Ю.</copyright-holder><copyright-holder xml:lang="en">Kuznetsov  I.I., Novikov O.P., Ilin D.Y.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/580">https://ellibs.elpub.ru/jour/article/view/580</self-uri><abstract><p>Метаданные научных публикаций используются для построения каталогов, определения цитируемости публикаций и решения других задач. Автоматизация извлечения метаданных из PDF-файлов позволяет ускорить выполнение обозначенных задач, а от качества извлеченных данных зависит возможность их дальнейшего использования. Проанализированы существующие программные решения, в итоге отобраны три: GROBID, CERMINE, ScientificPdfParser. Предложена методика сравнения этих программных решений распознавания текстов научных публикаций по качеству извлечения метаданных. На основе методики проведен эксперимент по извлечению четырех типов метаданных (название, аннотация, дата публикации, имена авторов). Для сравнения программных решений использован набор из 112457 публикаций с разбиением на 23 предметные области, сформированный на основе данных Semantic Scholar. Приведен пример выбора эффективного программного решения извлечения метаданных в условиях заданных приоритетов для предметных областей и типов метаданных с использованием взвешенной суммы. Определено, что для приведенного примера CERMINE показывает эффективность на 10,5% выше, чем GROBID, и на 9,6% выше, чем ScientificPdfParser.
</p></abstract><trans-abstract xml:lang="en"><p>Metadata of scientific publications are used to build catalogs, determine the citation of publications, and perform other tasks. Automation of metadata extraction from PDF files provides means to speed up the execution of the designated tasks, while the possibility of further use of the obtained data depends on the quality of extraction. Existing software solutions were analyzed, after which three of them were selected: GROBID, CERMINE, ScientificPdfParser. A procedure for comparing software solutions for recognizing texts of scientific publications by the quality of metadata extraction is proposed. Based on the procedure, an experiment was conducted to extract 4 types of metadata (title, abstract, publication date, author names). To compare software solutions, a dataset of 112,457 publications divided into 23 subject areas formed on the basis of Semantic Scholar data was used. An example of choosing an effective software solution for metadata extraction under the conditions of specified priorities for subject areas and types of metadata using a weighted sum is given. It was determined that for the given example CERMINE shows efficiency 10.5% higher than GROBID and 9.6% higher than ScientificPdfParser.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>распознавание текста</kwd><kwd>научные публикации</kwd><kwd>метаданные</kwd><kwd>качество извлечения данных</kwd><kwd>методика</kwd></kwd-group><kwd-group xml:lang="en"><kwd>text recognition</kwd><kwd>scientific publications</kwd><kwd>metadata</kwd><kwd>data extraction quality</kwd><kwd>procedure</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Qayyum F., Afzal M. T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content // Scientometrics. 2019. Vol. 118. P. 21-43.</mixed-citation><mixed-citation xml:lang="en">Qayyum F., Afzal M. T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content // Scientometrics. 2019. Vol. 118. P. 21-43.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Liu X., Zhang J., Guo C. Full‐text citation analysis: A new method to enhance scholarly networks //Journal of the American Society for Information Science and Technology. 2013. Т. 64. №. 9. P. 1852-1863.</mixed-citation><mixed-citation xml:lang="en">Liu X., Zhang J., Guo C. Full‐text citation analysis: A new method to enhance scholarly networks //Journal of the American Society for Information Science and Technology. 2013. Т. 64. №. 9. P. 1852-1863.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Saier T., Färber M. unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata // Scientometrics. 2020. Vol. 125. No. 3. P. 3085-3108.</mixed-citation><mixed-citation xml:lang="en">Saier T., Färber M. unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata // Scientometrics. 2020. Vol. 125. No. 3. P. 3085-3108.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Safder I. et al. Deep learning-based extraction of algorithmic metadata in full-text scholarly documents // Information processing &amp; management. 2020. Vol. 57. No. 6. P. 102269.</mixed-citation><mixed-citation xml:lang="en">Safder I. et al. Deep learning-based extraction of algorithmic metadata in full-text scholarly documents // Information processing &amp; management. 2020. Vol. 57. No. 6. P. 102269.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">O’Leary N. A. et al. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets // Scientific data. 2024. Vol. 11. No. 1. P. 732.</mixed-citation><mixed-citation xml:lang="en">O’Leary N. A. et al. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets // Scientific data. 2024. Vol. 11. No. 1. P. 732.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Safder I., Hassan S. U. Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications // Scientometrics. 2019. Vol. 119. P. 257-277.</mixed-citation><mixed-citation xml:lang="en">Safder I., Hassan S. U. Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications // Scientometrics. 2019. Vol. 119. P. 257-277.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Joshi B., Symeonidou A., Danish S.M., Hermsen F. An End-to-End Pipeline for Bibliography Extraction from Scientific Articles // Proceedings of the Second Workshop on Information Extraction from Scientific Publications. 2023. P. 101-106.</mixed-citation><mixed-citation xml:lang="en">Joshi B., Symeonidou A., Danish S.M., Hermsen F. An End-to-End Pipeline for Bibliography Extraction from Scientific Articles // Proceedings of the Second Workshop on Information Extraction from Scientific Publications. 2023. P. 101-106.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Ma A. et al. A deep-learning based citation count prediction model with paper metadata semantic features // Scientometrics. 2021. Vol. 126. No. 8. P. 6803-6823.</mixed-citation><mixed-citation xml:lang="en">Ma A. et al. A deep-learning based citation count prediction model with paper metadata semantic features // Scientometrics. 2021. Vol. 126. No. 8. P. 6803-6823.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Lo K. et al. PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023. P. 495-507.</mixed-citation><mixed-citation xml:lang="en">Lo K. et al. PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023. P. 495-507.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Po D. K. Similarity based information retrieval using Levenshtein distance algorithm // International Journal of Advances in Scientific Research and Engineering. 2020. Vol. 6. No. 04. P. 06-10.</mixed-citation><mixed-citation xml:lang="en">Po D. K. Similarity based information retrieval using Levenshtein distance algorithm // International Journal of Advances in Scientific Research and Engineering. 2020. Vol. 6. No. 04. P. 06-10.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Nurcahyawati V., Mustaffa Z. Online Media as a Price Monitor: Text Analysis using Text Extraction Technique and Jaro-Winkler Similarity Algorithm // 2020 Emerging Technology in Computing, Communication and Electronics (ETCCE). IEEE, 2020. P. 1-6.</mixed-citation><mixed-citation xml:lang="en">Nurcahyawati V., Mustaffa Z. Online Media as a Price Monitor: Text Analysis using Text Extraction Technique and Jaro-Winkler Similarity Algorithm // 2020 Emerging Technology in Computing, Communication and Electronics (ETCCE). IEEE, 2020. P. 1-6.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Foppiano L. et al. Automatic extraction of materials and properties from superconductors scientific literature // Science and Technology of Advanced Materials: Methods. 2023. Vol. 3. No. 1. P. 2153633.</mixed-citation><mixed-citation xml:lang="en">Foppiano L. et al. Automatic extraction of materials and properties from superconductors scientific literature // Science and Technology of Advanced Materials: Methods. 2023. Vol. 3. No. 1. P. 2153633.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Petersen T. et al. Geo-quantities: A framework for automatic extraction of measurements and spatial context from scientific documents // Proceedings of the 17th International Symposium on Spatial and Temporal Databases. 2021. P. 166-169.</mixed-citation><mixed-citation xml:lang="en">Petersen T. et al. Geo-quantities: A framework for automatic extraction of measurements and spatial context from scientific documents // Proceedings of the 17th International Symposium on Spatial and Temporal Databases. 2021. P. 166-169.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Chraibi A. et al. Extraction of measurements from medical reports // 10ème conférence Francophone en Gestion et Ingénierie des Systèmes Hospitaliers, GISEH2020. 2020.</mixed-citation><mixed-citation xml:lang="en">Chraibi A. et al. Extraction of measurements from medical reports // 10ème conférence Francophone en Gestion et Ingénierie des Systèmes Hospitaliers, GISEH2020. 2020.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Haviana S. F. C., Subroto I. M. I. Obtaining Reference’s Topic Congruity in Indonesian Publications using Machine Learning Approach // 2019 6th International Con-ference on Electrical Engineering, Computer Science and Informatics (EECSI). IEEE. 2019. P. 428-431.</mixed-citation><mixed-citation xml:lang="en">Haviana S. F. C., Subroto I. M. I. Obtaining Reference’s Topic Congruity in Indonesian Publications using Machine Learning Approach // 2019 6th International Con-ference on Electrical Engineering, Computer Science and Informatics (EECSI). IEEE. 2019. P. 428-431.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Ermakova L. Bordignon F., Turenne N., Noel M. Is the Abstract a Mere Teaser? Evaluating generosity of article abstracts in the environmental sciences // Frontiers in Research Metrics and Analytics. 2018. Vol 3. P. 16.</mixed-citation><mixed-citation xml:lang="en">Ermakova L. Bordignon F., Turenne N., Noel M. Is the Abstract a Mere Teaser? Evaluating generosity of article abstracts in the environmental sciences // Frontiers in Research Metrics and Analytics. 2018. Vol 3. P. 16.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">El-Ebshihy A. et al. A platform for argumentative zoning annotation and scien-tific summarization // Proceedings of the 31st ACM International Conference on Infor-mation &amp; Knowledge Management. 2022. P. 4843-4847.</mixed-citation><mixed-citation xml:lang="en">El-Ebshihy A. et al. A platform for argumentative zoning annotation and scien-tific summarization // Proceedings of the 31st ACM International Conference on Infor-mation &amp; Knowledge Management. 2022. P. 4843-4847.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Choi W. et al. Building an annotated corpus for automatic metadata extraction from multilingual journal article references // PloS one. 2023. Vol. 18. No. 1. P. E0280637.</mixed-citation><mixed-citation xml:lang="en">Choi W. et al. Building an annotated corpus for automatic metadata extraction from multilingual journal article references // PloS one. 2023. Vol. 18. No. 1. P. E0280637.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Krause J. et al. Bootstrapping multilingual metadata extraction: a showcase in cyrillic // Proceedings of the Second Workshop on Scholarly Document Processing. 2021. P. 66-72.</mixed-citation><mixed-citation xml:lang="en">Krause J. et al. Bootstrapping multilingual metadata extraction: a showcase in cyrillic // Proceedings of the Second Workshop on Scholarly Document Processing. 2021. P. 66-72.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Shapiro I., Saier T., Färber M. Sequence Labeling for Citation Field Extraction from Cyrillic Script References // Proceedings of the Workshop on Scientific Document Understanding; co-located with 36th AAAI Conference on Artificial Inteligence (AAAI 2022). 2022.</mixed-citation><mixed-citation xml:lang="en">Shapiro I., Saier T., Färber M. Sequence Labeling for Citation Field Extraction from Cyrillic Script References // Proceedings of the Workshop on Scientific Document Understanding; co-located with 36th AAAI Conference on Artificial Inteligence (AAAI 2022). 2022.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Indrawati A., Yoganingrum A., Yuwono P. Evaluating the quality of the indo-nesian scientific journal references using ParsCit, CERMINE and GROBID // Library Phi-losophy and Practice. 2019. P. 1-14.</mixed-citation><mixed-citation xml:lang="en">Indrawati A., Yoganingrum A., Yuwono P. Evaluating the quality of the indo-nesian scientific journal references using ParsCit, CERMINE and GROBID // Library Phi-losophy and Practice. 2019. P. 1-14.</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Meuschke N. et al. A benchmark of pdf information extraction tools using a multi-task and multi-domain evaluation framework for academic documents // Interna-tional Conference on Information. Cham : Springer Nature Switzerland, 2023. P. 383-405.</mixed-citation><mixed-citation xml:lang="en">Meuschke N. et al. A benchmark of pdf information extraction tools using a multi-task and multi-domain evaluation framework for academic documents // Interna-tional Conference on Information. Cham : Springer Nature Switzerland, 2023. P. 383-405.</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Guo Z., Jin H. Reference metadata extraction from scientific papers // 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE. 2011. P. 45-49.</mixed-citation><mixed-citation xml:lang="en">Guo Z., Jin H. Reference metadata extraction from scientific papers // 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE. 2011. P. 45-49.</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Beel J., Langer S., Genzmehr M., Muller C. Docear’s PDF inspector: title extraction from PDF files // Proceedings of the 13th ACM/IEEE-CS joint conference on Dig-ital libraries. New York, NY, USA: ACM, 2013. P. 443–444.</mixed-citation><mixed-citation xml:lang="en">Beel J., Langer S., Genzmehr M., Muller C. Docear’s PDF inspector: title extraction from PDF files // Proceedings of the 13th ACM/IEEE-CS joint conference on Dig-ital libraries. New York, NY, USA: ACM, 2013. P. 443–444.</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">Jensen Z. et al. A machine learning approach to zeolite synthesis enabled by automatic literature data extraction // ACS central science. 2019. Vol. 5. No. 5. P. 892-899.</mixed-citation><mixed-citation xml:lang="en">Jensen Z. et al. A machine learning approach to zeolite synthesis enabled by automatic literature data extraction // ACS central science. 2019. Vol. 5. No. 5. P. 892-899.</mixed-citation></citation-alternatives></ref><ref id="cit26"><label>26</label><citation-alternatives><mixed-citation xml:lang="ru">Färber M., Albers A., Schüber F. Identifying used methods and datasets in scientific publications // Proceedings of the Workshop on Scientific Document Under-standing co-located with 35th AAAI Conference on Artificial Inteligence (AAAI 2021). 2021.</mixed-citation><mixed-citation xml:lang="en">Färber M., Albers A., Schüber F. Identifying used methods and datasets in scientific publications // Proceedings of the Workshop on Scientific Document Under-standing co-located with 35th AAAI Conference on Artificial Inteligence (AAAI 2021). 2021.</mixed-citation></citation-alternatives></ref><ref id="cit27"><label>27</label><citation-alternatives><mixed-citation xml:lang="ru">Suryawati E., Widyantoro D. H. Combination of heuristic, rule-based and machine learning for bibliography extraction // 2017 5th International Conference on In-strumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME). IEEE. 2017. P. 276-281.</mixed-citation><mixed-citation xml:lang="en">Suryawati E., Widyantoro D. H. Combination of heuristic, rule-based and machine learning for bibliography extraction // 2017 5th International Conference on In-strumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME). IEEE. 2017. P. 276-281.</mixed-citation></citation-alternatives></ref><ref id="cit28"><label>28</label><citation-alternatives><mixed-citation xml:lang="ru">Tkaczyk D. et al. CERMINE: automatic extraction of structured metadata from scientific literature // International Journal on Document Analysis and Recognition (IJDAR). 2015. Vol. 18. P. 317-335.</mixed-citation><mixed-citation xml:lang="en">Tkaczyk D. et al. CERMINE: automatic extraction of structured metadata from scientific literature // International Journal on Document Analysis and Recognition (IJDAR). 2015. Vol. 18. P. 317-335.</mixed-citation></citation-alternatives></ref><ref id="cit29"><label>29</label><citation-alternatives><mixed-citation xml:lang="ru">Romary L., Lopez P. Grobid-information extraction from scientific publica-tions // ERCIM News. 2015. Vol. 100.</mixed-citation><mixed-citation xml:lang="en">Romary L., Lopez P. Grobid-information extraction from scientific publica-tions // ERCIM News. 2015. Vol. 100.</mixed-citation></citation-alternatives></ref><ref id="cit30"><label>30</label><citation-alternatives><mixed-citation xml:lang="ru">Councill I. G., Giles C. L., Kan M. Y. ParsCit: an Open-source CRF Reference String Parsing Package // Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. 2008. Vol. 8. P. 661-667.</mixed-citation><mixed-citation xml:lang="en">Councill I. G., Giles C. L., Kan M. Y. ParsCit: an Open-source CRF Reference String Parsing Package // Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. 2008. Vol. 8. P. 661-667.</mixed-citation></citation-alternatives></ref><ref id="cit31"><label>31</label><citation-alternatives><mixed-citation xml:lang="ru">Prasad A., Kaur M., Kan M. Y. Neural ParsCit: a deep learning-based reference string parser // International journal on digital libraries. 2018. Vol. 19. P. 323-337.</mixed-citation><mixed-citation xml:lang="en">Prasad A., Kaur M., Kan M. Y. Neural ParsCit: a deep learning-based reference string parser // International journal on digital libraries. 2018. Vol. 19. P. 323-337.</mixed-citation></citation-alternatives></ref><ref id="cit32"><label>32</label><citation-alternatives><mixed-citation xml:lang="ru">Constantin A., Pettifer S., Voronkov A. PDFX: fully-automated PDF-to-XML conversion of scientific literature // Proceedings of the 2013 ACM symposium on Doc-ument engineering. 2013. P. 177-180.</mixed-citation><mixed-citation xml:lang="en">Constantin A., Pettifer S., Voronkov A. PDFX: fully-automated PDF-to-XML conversion of scientific literature // Proceedings of the 2013 ACM symposium on Doc-ument engineering. 2013. P. 177-180.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
