<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2021-24-6-1006-1022</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-306</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>О модели поиска синонимов</article-title><trans-title-group xml:lang="en"><trans-title>On the Synonym Search Model</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Атаева</surname><given-names>О. М.</given-names></name><name name-style="western" xml:lang="en"><surname>Ataeva</surname><given-names>O. M.</given-names></name></name-alternatives><email xlink:type="simple">oli@ultimeta.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Серебряков</surname><given-names>В. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Serebriakov</surname><given-names>V. A.</given-names></name></name-alternatives><email xlink:type="simple">serebr@ultimeta.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Тучкова</surname><given-names>Н. П.</given-names></name><name name-style="western" xml:lang="en"><surname>Tuchkova</surname><given-names>N. P.</given-names></name></name-alternatives><email xlink:type="simple">natalia_tuchkova@mail.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Вычислительный центр им. А.А. Дородницына ФИЦ ИУ РАН</institution></aff><aff xml:lang="en"><institution>Dorodnicyn Computing Centre FRC CSC RAS</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2021</year></pub-date><pub-date pub-type="epub"><day>28</day><month>12</month><year>2021</year></pub-date><volume>24</volume><issue>6</issue><fpage>1006</fpage><lpage>1022</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Атаева О.М., Серебряков В.А., Тучкова Н.П., 2021</copyright-statement><copyright-year>2021</copyright-year><copyright-holder xml:lang="ru">Атаева О.М., Серебряков В.А., Тучкова Н.П.</copyright-holder><copyright-holder xml:lang="en">Ataeva O.M., Serebriakov V.A., Tuchkova N.P.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/306">https://ellibs.elpub.ru/jour/article/view/306</self-uri><abstract><p>Рассмотрена задача нахождения наиболее релевантных документов в результате расширенного и уточненного запроса. Для ее решения предложены модель поиска и механизм предварительной обработки текста, а также совместное использование поисковой системы и модели, построенной на основе индекса с помощью алгоритмов word2vec для генерации расширенного запроса с синонимами и уточнения результатов поиска на основе подбора похожих документов в цифровой семантической библиотеке. В работе исследуется построение векторного представления документов применительно к массиву данных цифровой семантической библиотеки LibMeta. Решалась задача обогащения пользовательских запросов синонимами. При построении модели поиска совместно с алгоритмами word2vec использован подход «сначала индексация, затем обучение», что позволяет получить более точные результаты поиска. Обучение модели проводилось на базе контента библиотеки для предметной области «Математика». Приведены примеры расширенного запроса с использованием синонимов.
</p></abstract><trans-abstract xml:lang="en"><p>The problem of finding the most relevant documents as a result of an extended and refined query is considered. For this, a search model and a text preprocessing mechanism are proposed, as well as the joint use of a search engine and a neural network model built on the basis of an index using word2vec algorithms to generate an extended query with synonyms and refine search results based on a selection of similar documents in a digital semantic library. The paper investigates the construction of a vector representation of documents based on paragraphs in relation to the data array of the digital semantic library LibMeta. Each piece of text is labeled. Both the whole document and its separate parts can be marked. The problem of enriching user queries with synonyms was solved, then when building a search model together with word2vec algorithms, an approach of "indexing first, then training" was used to cover more information and give more accurate search results. The model was trained on the basis of the library's mathematical content. Examples of training, extended query and search quality assessment using training and synonyms are given.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>модель поиска</kwd><kwd>алгоритм word2vec</kwd><kwd>синонимы</kwd><kwd>информационный запрос</kwd><kwd>расширение запроса</kwd></kwd-group><kwd-group xml:lang="en"><kwd>search model</kwd><kwd>word2vec algorithm</kwd><kwd>synonyms</kwd><kwd>information query</kwd><kwd>query extension</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrieval. ACM Press, New York, 1999. 518 p.</mixed-citation><mixed-citation xml:lang="en">Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrieval. ACM Press, New York, 1999. 518 p.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Salton G. Introduction to Modern Information Retrieval. McGraw-Hill, 1983, 513 p.</mixed-citation><mixed-citation xml:lang="en">Salton G. Introduction to Modern Information Retrieval. McGraw-Hill, 1983, 513 p.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Blei D.M., Ng A.Y., Jordan M.I. Latent Dirichlet Allocation // Journal of Machine Learning Research. 2003. V. 3. P. 993–1022.</mixed-citation><mixed-citation xml:lang="en">Blei D.M., Ng A.Y., Jordan M.I. Latent Dirichlet Allocation // Journal of Machine Learning Research. 2003. V. 3. P. 993–1022.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Furnas G.W., Landauer T.K., Gomez L.M., Dumais S.T. The vocabulary problem in human-system communication // Commun. ACM. 1987. V. 30 No. 11 P. 964–971.</mixed-citation><mixed-citation xml:lang="en">Furnas G.W., Landauer T.K., Gomez L.M., Dumais S.T. The vocabulary problem in human-system communication // Commun. ACM. 1987. V. 30 No. 11 P. 964–971.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Biswas G., Bezdek J., Oakman R.L. A knowledge-based approach to online document retrieval system design. In Proc. ACM SIGART Int. Symp. Methodol. Intell. Syst. 1986. P. 112 120.</mixed-citation><mixed-citation xml:lang="en">Biswas G., Bezdek J., Oakman R.L. A knowledge-based approach to online document retrieval system design. In Proc. ACM SIGART Int. Symp. Methodol. Intell. Syst. 1986. P. 112 120.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Мак-Каллок У.С., Питтс В. Логическое исчисление идей, относящихся к нервной активности // Автоматы. Под ред. К. Э. Шеннона и Дж. Маккарти. М.: Изд-во иностр. лит., 1956. С. 363–384 (Перевод английской статьи 1943 г.).</mixed-citation><mixed-citation xml:lang="en">Мак-Каллок У.С., Питтс В. Логическое исчисление идей, относящихся к нервной активности // Автоматы. Под ред. К. Э. Шеннона и Дж. Маккарти. М.: Изд-во иностр. лит., 1956. С. 363–384 (Перевод английской статьи 1943 г.).</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Профессиональный информационно-аналитический ресурс, посвященный машинному обучению, распознаванию образов и интеллектуальному анализу данных. URL: http://www.machinelearning.ru/ (доступно 26.10.2021)</mixed-citation><mixed-citation xml:lang="en">Профессиональный информационно-аналитический ресурс, посвященный машинному обучению, распознаванию образов и интеллектуальному анализу данных. URL: http://www.machinelearning.ru/ (доступно 26.10.2021)</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Гаврилова Т.А., Хорошевский В.Ф. Базы знаний интеллектуальных систем. СПб.: Питер, 2000. 384 с.</mixed-citation><mixed-citation xml:lang="en">Гаврилова Т.А., Хорошевский В.Ф. Базы знаний интеллектуальных систем. СПб.: Питер, 2000. 384 с.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Атаева О.М., Серебряков В.А. Онтология цифровой семантической библиотеки LibMeta // Информатика и её применения. 2018. Т. 12. № 1. С. 2–10.</mixed-citation><mixed-citation xml:lang="en">Атаева О.М., Серебряков В.А. Онтология цифровой семантической библиотеки LibMeta // Информатика и её применения. 2018. Т. 12. № 1. С. 2–10.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // Proceedings of Workshop at ICLR, 2013.</mixed-citation><mixed-citation xml:lang="en">Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // Proceedings of Workshop at ICLR, 2013.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Mikolov T., Yih W.T., Zweig C. Linguistic Regularities in Continuous Space Word Representations // Proceedings of NAACL HLT, 2013.</mixed-citation><mixed-citation xml:lang="en">Mikolov T., Yih W.T., Zweig C. Linguistic Regularities in Continuous Space Word Representations // Proceedings of NAACL HLT, 2013.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Le Q., Mikolov T. Distributed Representations of Sentences and Document // International Conference on Machine Learning. 2014. P. 1188–1196.</mixed-citation><mixed-citation xml:lang="en">Le Q., Mikolov T. Distributed Representations of Sentences and Document // International Conference on Machine Learning. 2014. P. 1188–1196.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Ataeva O.M., Sererbryakov V.A., Tuchkova N.P. Using Applied Ontology to Saturate Semantic Relations // Lobachevskii Journal of Mathematics. 2021. V. 42. No. 8. P. 1776–1785.</mixed-citation><mixed-citation xml:lang="en">Ataeva O.M., Sererbryakov V.A., Tuchkova N.P. Using Applied Ontology to Saturate Semantic Relations // Lobachevskii Journal of Mathematics. 2021. V. 42. No. 8. P. 1776–1785.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Voorhees E.M. Query expansion using lexical-semantic relations. 17th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., Dublin, Ireland, 1994.</mixed-citation><mixed-citation xml:lang="en">Voorhees E.M. Query expansion using lexical-semantic relations. 17th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., Dublin, Ireland, 1994.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Buckley C., Salton G., Allan J., Singhal A. Automatic query expansion using SMART: TREC 3, presented at the 3rd Text Retr. Conf. (TREC), 1995.</mixed-citation><mixed-citation xml:lang="en">Buckley C., Salton G., Allan J., Singhal A. Automatic query expansion using SMART: TREC 3, presented at the 3rd Text Retr. Conf. (TREC), 1995.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Efthimiadis E.N. Query expansion // Annu. Rev. Inf. Sci. Technol. 1996. V. 31. No. 5. P. 121–187.</mixed-citation><mixed-citation xml:lang="en">Efthimiadis E.N. Query expansion // Annu. Rev. Inf. Sci. Technol. 1996. V. 31. No. 5. P. 121–187.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
