<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2022-25-6-616-639</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-399</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Cемантическое аннотирование математических формул в PDF-документах</article-title><trans-title-group xml:lang="en"><trans-title>Semantic Annotation of Mathematical Formulas in PDF-Documents</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Невзорова</surname><given-names>О. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Nevzorova</surname><given-names>O. A.</given-names></name></name-alternatives><email xlink:type="simple">onevzoro@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Николаев</surname><given-names>К. С.</given-names></name><name name-style="western" xml:lang="en"><surname>Nikolaev</surname><given-names>K. S.</given-names></name></name-alternatives><email xlink:type="simple">konnikolaeff@yandex.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Казанский (Приволжский) Федеральный университет</institution></aff><aff xml:lang="en"><institution>Kazan (Volga region) Federal University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2022</year></pub-date><pub-date pub-type="epub"><day>28</day><month>12</month><year>2022</year></pub-date><volume>25</volume><issue>6</issue><fpage>616</fpage><lpage>639</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Невзорова О.А., Николаев К.С., 2022</copyright-statement><copyright-year>2022</copyright-year><copyright-holder xml:lang="ru">Невзорова О.А., Николаев К.С.</copyright-holder><copyright-holder xml:lang="en">Nevzorova O.A., Nikolaev K.S.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/399">https://ellibs.elpub.ru/jour/article/view/399</self-uri><abstract><p>Дан обзор существующих решений по семантическому анализу математических документов, а также описан метод автоматического семантического анализа документов, представленных в формате PDF. Разработанный метод позволяет выделять математические формулы внутри документа, анализировать их структуру, выполнять поиск локальных переменных формулы и их определений в документе, а также связывать переменные формулы и понятия из онтологии. Преимуществом разработанного метода перед другими существующими является независимость от разметки исходного PDF-документа, что расширяет область применения метода. Приведены оценки полноты, точности и F-меры для алгоритмов поиска переменных и связывания локальных переменных с формулами. Полученная семантическая разметка документа позволяет создавать коллекции документов, пригодных для сервиса семантического поиска формул, который является одним из сервисов цифровой библиотеки Lobachevskii-DML.
</p></abstract><trans-abstract xml:lang="en"><p>This article provides an overview of existing solutions for semantic analysis of mathematical documents, and also presents a method for automatic semantic analysis of documents in PDF format. This method searches for local variables in the text of the article, extracts their definitions and connects concepts with formulas. The advantage of the method over the existing ones is independence from the markup of the original PDF document, which expands the scope of the method. We provide estimates of recall, precision and F-measure for algorithms for finding variables and linking local variables with formulas. The resulting semantic markup of the document will be used to create a collection of documents suitable for the semantic formula search service, which is part of the set of services of the Lobachevskii-DML digital publishing system.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>семантический анализ</kwd><kwd>обработка документов</kwd><kwd>научные журналы</kwd></kwd-group><kwd-group xml:lang="en"><kwd>PDF</kwd><kwd>Lobachevskii-DML</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Nevzorova O., Zhiltsov N., Zaikin D., Zhibrik O., Kirillovich A., Nevzorov V., Birialtsev E. Bringing math to LOD: A semantic publishing platform prototype for scientific collections in mathematics // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2013. Vol. 8218 LNCS. No. 1. P. 379–394.</mixed-citation><mixed-citation xml:lang="en">Nevzorova O., Zhiltsov N., Zaikin D., Zhibrik O., Kirillovich A., Nevzorov V., Birialtsev E. Bringing math to LOD: A semantic publishing platform prototype for scientific collections in mathematics // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2013. Vol. 8218 LNCS. No. 1. P. 379–394.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Bertin M., Atanassova I. Hybrid Approach for the Semantic Processing of Scientific Papers // Semantic Publishing Challenge Track in 11th European Semantic Web Conference (ESWC 2014). 2014. P. 1–5.</mixed-citation><mixed-citation xml:lang="en">Bertin M., Atanassova I. Hybrid Approach for the Semantic Processing of Scientific Papers // Semantic Publishing Challenge Track in 11th European Semantic Web Conference (ESWC 2014). 2014. P. 1–5.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Ciancarini P., Di Iorio A., Nuzzolese A.G., Silvio P., Fabio V. Semantic annotation of scholarly documents and citations // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2013. Vol. 8249 LNAI. P. 336–347.</mixed-citation><mixed-citation xml:lang="en">Ciancarini P., Di Iorio A., Nuzzolese A.G., Silvio P., Fabio V. Semantic annotation of scholarly documents and citations // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2013. Vol. 8249 LNAI. P. 336–347.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Ronzano F., Del Bosque G.C., Saggion H. CEUR-WS proceedings: Towards the automatic generation of highly descriptive scholarly publishing linked datasets // Communications in Computer and Information Science. 2014. Vol. 475. P. 83–88.</mixed-citation><mixed-citation xml:lang="en">Ronzano F., Del Bosque G.C., Saggion H. CEUR-WS proceedings: Towards the automatic generation of highly descriptive scholarly publishing linked datasets // Communications in Computer and Information Science. 2014. Vol. 475. P. 83–88.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Ahmad R., Afzal M.T., Qadir M.A. Information extraction from PDF sources based on rule-based system using integrated formats // Communications in Computer and Information Science. 2016. Vol. 641. P. 293–308.</mixed-citation><mixed-citation xml:lang="en">Ahmad R., Afzal M.T., Qadir M.A. Information extraction from PDF sources based on rule-based system using integrated formats // Communications in Computer and Information Science. 2016. Vol. 641. P. 293–308.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Greiner-Petter A., Youssef A., Ruas T., Miller, Bruce R., Schubotz M., Aizawa A., Gipp B. Math-word embedding in math search and semantic extraction // Scientometrics. 2020. Vol. 125. No. 3. P. 3017–3046.</mixed-citation><mixed-citation xml:lang="en">Greiner-Petter A., Youssef A., Ruas T., Miller, Bruce R., Schubotz M., Aizawa A., Gipp B. Math-word embedding in math search and semantic extraction // Scientometrics. 2020. Vol. 125. No. 3. P. 3017–3046.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Wolska M., Grigore M. Symbol declarations in mathematical writing // Proceedings of the 3rd Workshop on Digital Mathematics Libraries. 2010. P. 119–127.</mixed-citation><mixed-citation xml:lang="en">Wolska M., Grigore M. Symbol declarations in mathematical writing // Proceedings of the 3rd Workshop on Digital Mathematics Libraries. 2010. P. 119–127.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Líška M., Sojka P., Ružička M., Mravec P. Web interface and collection for mathematical retrieval WebMIaS and MREC // DML 2011 – Towards a Digital Mathematics Library, Proceedings. 2011. P. 77–84.</mixed-citation><mixed-citation xml:lang="en">Líška M., Sojka P., Ružička M., Mravec P. Web interface and collection for mathematical retrieval WebMIaS and MREC // DML 2011 – Towards a Digital Mathematics Library, Proceedings. 2011. P. 77–84.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Schubotz M., Greiner-Petter A., Scharpf P., Meuschke N., Cohl H.S., Gipp B. Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context // Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. New York, NY, USA: ACM. 2018. P. 233–242.</mixed-citation><mixed-citation xml:lang="en">Schubotz M., Greiner-Petter A., Scharpf P., Meuschke N., Cohl H.S., Gipp B. Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context // Proceedings of the ACM/IEEE Joint Conference on Digital Libraries. New York, NY, USA: ACM. 2018. P. 233–242.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Nevzorova O., Kirillovich A., Nevzorov V., Nikolaev K. The semantic context models of mathematical formulas in scientific papers // CEUR Workshop Proceedings. 2018. Vol. 2277. P. 33–40.</mixed-citation><mixed-citation xml:lang="en">Nevzorova O., Kirillovich A., Nevzorov V., Nikolaev K. The semantic context models of mathematical formulas in scientific papers // CEUR Workshop Proceedings. 2018. Vol. 2277. P. 33–40.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Elizarov A.M., Kirillovich A.V., Lipachev E.K., Nevzorova O.A. OntoMathPRO: Ontology of Mathematical Knowledge // Dokl. Math. 2022. https://doi.org/10.1134/S1064562422700016</mixed-citation><mixed-citation xml:lang="en">Elizarov A.M., Kirillovich A.V., Lipachev E.K., Nevzorova O.A. OntoMathPRO: Ontology of Mathematical Knowledge // Dokl. Math. 2022. https://doi.org/10.1134/S1064562422700016</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
