<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2023-26-1-52-79</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-365</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Как эмбеддинги имен сущностей влияют на качество выравнивания сущностей</article-title><trans-title-group xml:lang="en"><trans-title>How Entity Name Embedings Affect the Quality of Entity Alignment</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Гусев</surname><given-names>Д. И.</given-names></name><name name-style="western" xml:lang="en"><surname>Gusev</surname><given-names>D. I.</given-names></name></name-alternatives><email xlink:type="simple">d.gusev1@g.nsu.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Апанович</surname><given-names>З. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Apanovich</surname><given-names>Z. V.</given-names></name></name-alternatives><email xlink:type="simple">apanovich@iis.nsk.su</email><xref ref-type="aff" rid="aff-2"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Новосибирский государственный университет</institution></aff><aff xml:lang="en"><institution>Novosibirsk State University</institution></aff></aff-alternatives><aff-alternatives id="aff-2"><aff xml:lang="ru"><institution>Институт систем информатики им. А.П. Ершова Сибирского отделения Российской академии наук</institution></aff><aff xml:lang="en"><institution>A.P. Ershov Institute of Informatics Systems</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2023</year></pub-date><pub-date pub-type="epub"><day>28</day><month>02</month><year>2023</year></pub-date><volume>26</volume><issue>1</issue><fpage>52</fpage><lpage>79</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Гусев Д.И., Апанович З.В., 2023</copyright-statement><copyright-year>2023</copyright-year><copyright-holder xml:lang="ru">Гусев Д.И., Апанович З.В.</copyright-holder><copyright-holder xml:lang="en">Gusev D.I., Apanovich Z.V.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/365">https://ellibs.elpub.ru/jour/article/view/365</self-uri><abstract><p>Алгоритмы установления соответствия между сущностями осуществляют поиск эквивалентных сущностей в разноязычных графах знаний. Данная проблема возникает, как правило, при интеграции разноязычных графов знаний. В настоящее время решение этой проблемы становится весьма актуальным для практического решения проблем импортозамещения, например, чтобы найти информацию о лекарствах, выпускаемых в разных странах под разными названиями, или же решить проблему поиска эквивалентных запчастей.
&#13;

В настоящее время известно несколько библиотек с открытым кодом, которые объединяют известные алгоритмы выравнивания сущностей, а также тестовые наборы данных для различных языков. В данной работе описан русско-английский набор данных для экспериментов с нескольким популярными алгоритмами выравнивания сущностей. Особое внимание уделено методам генерации векторных представлений для имен сущностей. В частности, рассмотрены комбинации различных методов генерации векторных представлений (эмбеддингов) имен сущностей с известными алгоритмами выравнивания сущностей. Таблицы с результатами экспериментов дополнены визуализациями. 
</p></abstract><trans-abstract xml:lang="en"><p>Cross-lingual entity alignment algorithms are designed to look for identical real-world objects in multilingual knowledge graphs. This problem occurs, for example, when searching for drugs manufactured in different countries under different names, or when searching for imported equipment. At the moment, there are several open-source libraries that collect implementations of entity alignment algorithms as well as test data sets for various languages. This paper describes experiments with several popular entity alignment algorithms applied to a Russian-English dataset. In addition to translating entity names from Russian to English, experiments on combining the various generators of entity name embeddings with the various generators of relational information embeddings have been conducted. In order to obtain more detailed information about the results of the EA approaches, an assessment by entity types, the number of relationships and attributes have been made. These experiments allowed us to significantly improve the accuracy of several EA algorithms on the English-Russian dataset.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>разноязычные графы знаний</kwd><kwd>идентификация сущностей</kwd></kwd-group><kwd-group xml:lang="en"><kwd>cross-lingual entity alignment</kwd><kwd>knowledge graphs</kwd><kwd>relational embeddings</kwd><kwd>name embeddings</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Sun Z., Zhang Q., Hu W., Wang C., Chen M., Akrami F. et al. A benchmarking study of embedding-based entity alignment for knowledge graphs // Proc. VLDB Endowment. 2020. Vol. 13. P. 2326–2340.</mixed-citation><mixed-citation xml:lang="en">Sun Z., Zhang Q., Hu W., Wang C., Chen M., Akrami F. et al. A benchmarking study of embedding-based entity alignment for knowledge graphs // Proc. VLDB Endowment. 2020. Vol. 13. P. 2326–2340.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Gnezdilova V.A., Apanovich Z.V., Russian-English dataset and comparative analysis of algorithms for cross-language embedding-based entity alignment // Journal of Physics: Conference Series. 2021. Vol. 2099.</mixed-citation><mixed-citation xml:lang="en">Gnezdilova V.A., Apanovich Z.V., Russian-English dataset and comparative analysis of algorithms for cross-language embedding-based entity alignment // Journal of Physics: Conference Series. 2021. Vol. 2099.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Zhang Q., Sun Z., Hu W., Chen M., Guo L. et al. Multi-view knowledge graph embedding for entity alignment // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019. P. 5429–5435.</mixed-citation><mixed-citation xml:lang="en">Zhang Q., Sun Z., Hu W., Chen M., Guo L. et al. Multi-view knowledge graph embedding for entity alignment // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019. P. 5429–5435.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space, January 2013, URL: https://arxiv.org/abs/1301.3781.</mixed-citation><mixed-citation xml:lang="en">Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space, January 2013, URL: https://arxiv.org/abs/1301.3781.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Bordes A., Usunier N., Garcia-Durán A, Weston J., Yakhnenko O. Translating embeddings for modeling multi-relational data // Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013. Vol. 2. P. 2787–2795.</mixed-citation><mixed-citation xml:lang="en">Bordes A., Usunier N., Garcia-Durán A, Weston J., Yakhnenko O. Translating embeddings for modeling multi-relational data // Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013. Vol. 2. P. 2787–2795.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Wu Y., Liu X., Feng Y., Wang Z., Yan R., Zhao D. Relation-aware entity alignment for heterogeneous knowledge graphs // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019. P. 5278–5284.</mixed-citation><mixed-citation xml:lang="en">Wu Y., Liu X., Feng Y., Wang Z., Yan R., Zhao D. Relation-aware entity alignment for heterogeneous knowledge graphs // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019. P. 5278–5284.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Veličković P., Cucurull G., Casanova A., Romero A., Liò P., Bengio Y. Graph attention networks// ICLR. 2018. 12 p.</mixed-citation><mixed-citation xml:lang="en">Veličković P., Cucurull G., Casanova A., Romero A., Liò P., Bengio Y. Graph attention networks// ICLR. 2018. 12 p.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Wang Z., Lv Q., Lan X., Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks // Proc. of the Conference on Empirical Methods in Natural Language Processing. 201., P. 349–357.</mixed-citation><mixed-citation xml:lang="en">Wang Z., Lv Q., Lan X., Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks // Proc. of the Conference on Empirical Methods in Natural Language Processing. 201., P. 349–357.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Mao X., Wang W., Wu Y., Lan M. From alignment to assignment: frustratingly simple unsupervised entity alignment // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. P. 2843–2853.</mixed-citation><mixed-citation xml:lang="en">Mao X., Wang W., Wu Y., Lan M. From alignment to assignment: frustratingly simple unsupervised entity alignment // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. P. 2843–2853.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Xu K., Wang L., Yu M., Feng Y., Song Y., et al. Cross-lingual knowledge graph alignment via graph matching neural network // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 3156–3161.</mixed-citation><mixed-citation xml:lang="en">Xu K., Wang L., Yu M., Feng Y., Song Y., et al. Cross-lingual knowledge graph alignment via graph matching neural network // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 3156–3161.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Pennington J, Socher R., Manning C.D. GloVe: Global Vectors for Word Representation // Conference on Empirical Methods in Natural Language. 2014. P. 1532–1543.</mixed-citation><mixed-citation xml:lang="en">Pennington J, Socher R., Manning C.D. GloVe: Global Vectors for Word Representation // Conference on Empirical Methods in Natural Language. 2014. P. 1532–1543.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching word vectors with subword information // Transactions of the Association for Computational Linguistics. 2017. P. 135–146.</mixed-citation><mixed-citation xml:lang="en">Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching word vectors with subword information // Transactions of the Association for Computational Linguistics. 2017. P. 135–146.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019. Vol. 1. P. 4171–4186.</mixed-citation><mixed-citation xml:lang="en">Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019. Vol. 1. P. 4171–4186.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Fuglede B., Topsoe F. Jensen–Shannon divergence and Hilbert space embedding // Proceedings of the International Symposium on Information Theory, 2004. IEEE.</mixed-citation><mixed-citation xml:lang="en">Fuglede B., Topsoe F. Jensen–Shannon divergence and Hilbert space embedding // Proceedings of the International Symposium on Information Theory, 2004. IEEE.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Sun Z., Hu W., Zhang Q., Qu Y. Bootstrapping entity alignment with knowledge graph embedding // Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI-18), P. 4396–4402.</mixed-citation><mixed-citation xml:lang="en">Sun Z., Hu W., Zhang Q., Qu Y. Bootstrapping entity alignment with knowledge graph embedding // Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI-18), P. 4396–4402.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Guo L., Sun Z., Hu W. Learning to Exploit Long-term relational dependencies in knowledge graphs // Proceedings of the 36th International Conference on Machine Learning. 2019. Vol. 57. P. 2505–2514.</mixed-citation><mixed-citation xml:lang="en">Guo L., Sun Z., Hu W. Learning to Exploit Long-term relational dependencies in knowledge graphs // Proceedings of the 36th International Conference on Machine Learning. 2019. Vol. 57. P. 2505–2514.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Maaten L. van der, Hinton G. Visualizing data using t-SNE // Journal of Machine LearningResearch. 2008. Vol. 86. P. 2579–2605.</mixed-citation><mixed-citation xml:lang="en">Maaten L. van der, Hinton G. Visualizing data using t-SNE // Journal of Machine LearningResearch. 2008. Vol. 86. P. 2579–2605.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R. et al. XLNet: generalized autoregressive pretraining for language understanding // Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019. P. 5753–5763.</mixed-citation><mixed-citation xml:lang="en">Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R. et al. XLNet: generalized autoregressive pretraining for language understanding // Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019. P. 5753–5763.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Feng F., Yang Y., Cer D., Arivazhagan N., Wang W. Language-agnostic BERT sentence embedding. 2020. URL: https://arxiv.org/abs/2007.01852.</mixed-citation><mixed-citation xml:lang="en">Feng F., Yang Y., Cer D., Arivazhagan N., Wang W. Language-agnostic BERT sentence embedding. 2020. URL: https://arxiv.org/abs/2007.01852.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
