<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">ellibs-717</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Типы эмбеддингов и их применение в интеллектуальной академической генеалогии</article-title><trans-title-group xml:lang="en"><trans-title>Types of Embeddings and their Application in Intellectual Academic Genealogy</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Мариносян</surname><given-names>Андреас Хачатурович</given-names></name><name name-style="western" xml:lang="en"><surname>Marinosyan</surname><given-names>Andreas Khachaturovich</given-names></name></name-alternatives><email xlink:type="simple">a.marinosyan@yandex.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Московский городской педагогический университет</institution></aff><aff xml:lang="en"><institution>Moscow City University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2026</year></pub-date><pub-date pub-type="epub"><day>04</day><month>03</month><year>2026</year></pub-date><volume>29</volume><issue>1</issue><fpage>240</fpage><lpage>261</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Мариносян А.Х., 2026</copyright-statement><copyright-year>2026</copyright-year><copyright-holder xml:lang="ru">Мариносян А.Х.</copyright-holder><copyright-holder xml:lang="en">Marinosyan A.K.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/717">https://ellibs.elpub.ru/jour/article/view/717</self-uri><abstract><p>Рассмотрена проблема построения интерпретируемых векторных представлений научных текстов для задач интеллектуальной академической генеалогии. Предложена типология эмбеддингов, включающая три класса: статистические, выученные нейросетевые и структурированные символьные. Обоснована необходимость объединения достоинств нейросетевых (высокая семантическая точность) и символьных (интерпретируемость измерений) подходов. Для реализации такого гибридного подхода предложен алгоритм построения выученных символьных эмбеддингов путем регрессионного преобразования вектора внутреннего представления нейросетевой модели в интерпретируемый набор оценок.


Экспериментальная оценка алгоритма проведена на корпусе фрагментов авторефератов диссертаций по педагогическим наукам. Компактный трансформерный энкодер с регрессионной головой обучался воспроизводить тематические оценки, сгенерированные передовой генеративной языковой моделью. Сравнение шести режимов обучения (три типа регрессионной головы и два состояния энкодера) показало, что дообучение верхних слоев энкодера является ключевым фактором повышения качества. По результатам тестирования была выбрана наилучшая конфигурация, которая достигла коэффициента детерминации R² = 0.57 и точности определения трех наиболее релевантных концептов, равной 74%. Результаты подтверждают, что для определенного рода задач, в которых требуется формальное представление выходных данных, возможна аппроксимация поведения генеративной модели компактным энкодером с регрессионной головой при существенно меньших вычислительных затратах. В более широкой перспективе разработка алгоритмов построения выученных символьных эмбеддингов будет способствовать созданию такой модели формальной репрезентации научного знания, в которой конвергенция нейросетевых и символьных методов обеспечит как масштабируемость обработки научных текстов, так и интерпретируемость векторных представлений, кодирующих содержание.
</p></abstract><trans-abstract xml:lang="en"><p>The paper addresses the problem of constructing interpretable vector representations of scientific texts for intellectual academic genealogy. A typology of embeddings is proposed, comprising three classes: statistical, learned neural, and structured symbolic. The study argues for combining the strengths of neural embeddings (high semantic accuracy) with those of symbolic embeddings (interpretable dimensions). To operationalize this hybrid approach, an algorithm for learned symbolic embeddings is introduced, which utilizes a regression-based mapping from a model’s internal representation to an interpretable vector of scores.


The approach is evaluated on a corpus of fragments from dissertation abstracts in pedagogy. A compact transformer encoder with a regression head was trained to reproduce topic relevance scores produced by a state-of-the-art generative language model. A comparison of six training setups (three regression-head architectures and two encoder settings) shows that fine-tuning the upper encoder layers is the primary driver of quality improvements. The best configuration achieves R² = 0.57 and a Top-3 accuracy of 74% in identifying the most relevant concepts. These results suggest that, for tasks requiring formalized output representations, a compact encoder with a regression head can approximate a generative model’s behavior at substantially lower computational cost. More broadly, the further development of algorithms for constructing learned symbolic embeddings contributes to building a model of formal knowledge representation in which the convergence of neural and symbolic methods ensures both the scalability of scientific text processing and the interpretability of vector representations that encode their content.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>эмбеддинги</kwd><kwd>академическая генеалогия</kwd><kwd>трансформерный энкодер</kwd><kwd>регрессионная голова</kwd><kwd>символьные эмбеддинги</kwd><kwd>тематический профиль</kwd><kwd>обработка естественного языка</kwd><kwd>интерпретируемость</kwd><kwd>большие языковые модели</kwd><kwd>наукометрия</kwd></kwd-group><kwd-group xml:lang="en"><kwd>embeddings</kwd><kwd>academic genealogy</kwd><kwd>transformer encoder</kwd><kwd>regression head</kwd><kwd>symbolic embeddings</kwd><kwd>topic profile</kwd><kwd>natural language processing</kwd><kwd>interpretability</kwd><kwd>large language models</kwd><kwd>scientometrics</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Mulcahy C. The Mathematics Genealogy Project comes of age at twenty-one // Notices of the AMS. 2017. Vol. 64. No. 5. P. 466–470.</mixed-citation><mixed-citation xml:lang="en">Mulcahy C. The Mathematics Genealogy Project comes of age at twenty-one // Notices of the AMS. 2017. Vol. 64. No. 5. P. 466–470.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">David S.V., Hayden B.Y. Neurotree: A Collaborative, Graphical Database of the Academic Genealogy of Neuroscience // PLoS ONE. 2012. Vol. 7. No. 10. e46608. https://doi.org/10.1371/journal.pone.0046608</mixed-citation><mixed-citation xml:lang="en">David S.V., Hayden B.Y. Neurotree: A Collaborative, Graphical Database of the Academic Genealogy of Neuroscience // PLoS ONE. 2012. Vol. 7. No. 10. e46608. https://doi.org/10.1371/journal.pone.0046608</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Lerner I.M., Marinosyan A.Kh., Grigoriev S.G., Yusupov A.R., Anikieva M.A., Garifullina G.A. An Approach to the Formation of Intellectual Academic Genealogy Using Large Language Models // Journal Electromagnetic Waves and Electronic Systems. 2024. Vol. 29. No. 4. P. 108–120. https://doi.org/10.18127/j5604128-202404-09 (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Lerner I.M., Marinosyan A.Kh., Grigoriev S.G., Yusupov A.R., Anikieva M.A., Garifullina G.A. An Approach to the Formation of Intellectual Academic Genealogy Using Large Language Models // Journal Electromagnetic Waves and Electronic Systems. 2024. Vol. 29. No. 4. P. 108–120. https://doi.org/10.18127/j5604128-202404-09 (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Grigoriev S.G., Lerner I.M., Marinosyan A.Kh., Grigorieva M.A. On the Issue of Educational and Methodological Information Selection for Implementing an Adaptive Learning Management System: Algorithm of A Priori Authors Classification // Informatics and Education / Informatika i obrazovanie. 2025. Vol. 40. No. 2. P. 66–78. https://doi.org/10.32517/0234-0453-2025-40-2-66-78 (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Grigoriev S.G., Lerner I.M., Marinosyan A.Kh., Grigorieva M.A. On the Issue of Educational and Methodological Information Selection for Implementing an Adaptive Learning Management System: Algorithm of A Priori Authors Classification // Informatics and Education / Informatika i obrazovanie. 2025. Vol. 40. No. 2. P. 66–78. https://doi.org/10.32517/0234-0453-2025-40-2-66-78 (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Marinosyan A.Kh., Grigoriev S.G. Scientific Publications and the Embedding Space of Knowledge // Electronic Libraries / Russian Digital Library Journa. 2026. Vol. 29. No. 2. (In press.) (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Marinosyan A.Kh., Grigoriev S.G. Scientific Publications and the Embedding Space of Knowledge // Electronic Libraries / Russian Digital Library Journa. 2026. Vol. 29. No. 2. (In press.) (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Salton G., Buckley C. Term-Weighting Approaches in Automatic Text Retrieval // Information Processing &amp; Management. 1988. Vol. 24. No. 5. P. 513–523. https://doi.org/10.1016/0306-4573(88)90021-0</mixed-citation><mixed-citation xml:lang="en">Salton G., Buckley C. Term-Weighting Approaches in Automatic Text Retrieval // Information Processing &amp; Management. 1988. Vol. 24. No. 5. P. 513–523. https://doi.org/10.1016/0306-4573(88)90021-0</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Sparck Jones K. A Statistical Interpretation of Term Specificity and Its Application in Retrieval // Journal of Documentation. 1972. Vol. 28. No. 1. P. 11–21. https://doi.org/10.1108/eb026526</mixed-citation><mixed-citation xml:lang="en">Sparck Jones K. A Statistical Interpretation of Term Specificity and Its Application in Retrieval // Journal of Documentation. 1972. Vol. 28. No. 1. P. 11–21. https://doi.org/10.1108/eb026526</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // arXiv preprint. 2013. arXiv:1301.3781.</mixed-citation><mixed-citation xml:lang="en">Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // arXiv preprint. 2013. arXiv:1301.3781.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Pennington J., Socher R., Manning C.D. GloVe: Global Vectors for Word Representation // Proceedings of EMNLP. 2014. P. 1532–1543.</mixed-citation><mixed-citation xml:lang="en">Pennington J., Socher R., Manning C.D. GloVe: Global Vectors for Word Representation // Proceedings of EMNLP. 2014. P. 1532–1543.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019. Vol. 1. P. 4171–4186. https://doi.org/10.18653/v1/N19-1423</mixed-citation><mixed-citation xml:lang="en">Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019. Vol. 1. P. 4171–4186. https://doi.org/10.18653/v1/N19-1423</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Reimers N., Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks // Proceedings of EMNLP. 2019. P. 3982–3992. https://doi.org/10.18653/v1/D19-1410</mixed-citation><mixed-citation xml:lang="en">Reimers N., Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks // Proceedings of EMNLP. 2019. P. 3982–3992. https://doi.org/10.18653/v1/D19-1410</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Beltagy I., Lo K., Cohan A. SciBERT: A Pretrained Language Model for Scientific Text // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2019. P. 3615–3620. https://doi.org/10.18653/v1/D19-1371</mixed-citation><mixed-citation xml:lang="en">Beltagy I., Lo K., Cohan A. SciBERT: A Pretrained Language Model for Scientific Text // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2019. P. 3615–3620. https://doi.org/10.18653/v1/D19-1371</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Wang L., Yang N., Huang X., Yang L., Majumder R., Wei F. Multilingual E5 Text Embeddings: A Technical Report // arXiv preprint. 2024. arXiv:2402.05672.</mixed-citation><mixed-citation xml:lang="en">Wang L., Yang N., Huang X., Yang L., Majumder R., Wei F. Multilingual E5 Text Embeddings: A Technical Report // arXiv preprint. 2024. arXiv:2402.05672.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Marinosyan A.Kh., Grigoriev S.G., Lerner I.M., Anikieva M.A. Automated Comparison of Scientific Research Based on Academic Genealogy // Informatics and Education / Informatika i obrazovanie. 2025. Vol. 40. No. 6. P. 16–27. https://doi.org/10.32517/0234-0453-2025-40-6-16-27 (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Marinosyan A.Kh., Grigoriev S.G., Lerner I.M., Anikieva M.A. Automated Comparison of Scientific Research Based on Academic Genealogy // Informatics and Education / Informatika i obrazovanie. 2025. Vol. 40. No. 6. P. 16–27. https://doi.org/10.32517/0234-0453-2025-40-6-16-27 (In Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Elizarov A.M., Kirillovich A.V., Lipachev E.K., Nevzorova O.A., Solovyev V.D., Zhiltsov N.G. Mathematical Knowledge Representation: Semantic Models and Formalisms // Lobachevskii Journal of Mathematics. 2014. Vol. 35. No. 4. P. 348–354. https://doi.org/10.1134/S1995080214040143</mixed-citation><mixed-citation xml:lang="en">Elizarov A.M., Kirillovich A.V., Lipachev E.K., Nevzorova O.A., Solovyev V.D., Zhiltsov N.G. Mathematical Knowledge Representation: Semantic Models and Formalisms // Lobachevskii Journal of Mathematics. 2014. Vol. 35. No. 4. P. 348–354. https://doi.org/10.1134/S1995080214040143</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention Is All You Need // Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 5998–6008.</mixed-citation><mixed-citation xml:lang="en">Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention Is All You Need // Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 5998–6008.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Shimanaka H., Kajiwara T., Komachi M. Machine Translation Evaluation with BERT Regressor // arXiv preprint. 2019. arXiv:1907.12679.</mixed-citation><mixed-citation xml:lang="en">Shimanaka H., Kajiwara T., Komachi M. Machine Translation Evaluation with BERT Regressor // arXiv preprint. 2019. arXiv:1907.12679.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Viskov V., Kokush G., Larionov D., Eger S., Panchenko A. Semantically-Informed Regressive Encoder Score // Proceedings of the Eighth Conference on Machine Translation (WMT). 2023. P. 815–821. https://doi.org/10.18653/v1/2023.wmt-1.69</mixed-citation><mixed-citation xml:lang="en">Viskov V., Kokush G., Larionov D., Eger S., Panchenko A. Semantically-Informed Regressive Encoder Score // Proceedings of the Eighth Conference on Machine Translation (WMT). 2023. P. 815–821. https://doi.org/10.18653/v1/2023.wmt-1.69</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Gombert S., Menzel L., Di Mitri D., Drachsler H. Predicting Item Difficulty and Item Response Time with Scalar-Mixed Transformer Encoder Models and Rational Network Regression Heads // Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024). 2024. P. 483–492. URL: https://aclanthology.org/2024.bea-1.40/ (date accessed: 02.02.2026).</mixed-citation><mixed-citation xml:lang="en">Gombert S., Menzel L., Di Mitri D., Drachsler H. Predicting Item Difficulty and Item Response Time with Scalar-Mixed Transformer Encoder Models and Rational Network Regression Heads // Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024). 2024. P. 483–492. URL: https://aclanthology.org/2024.bea-1.40/ (date accessed: 02.02.2026).</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Alain G., Bengio Y. Understanding Intermediate Layers Using Linear Classifier Probes // arXiv preprint. 2017. arXiv:1610.01644.</mixed-citation><mixed-citation xml:lang="en">Alain G., Bengio Y. Understanding Intermediate Layers Using Linear Classifier Probes // arXiv preprint. 2017. arXiv:1610.01644.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting // Journal of Machine Learning Research. 2014. Vol. 15. No. 1. P. 1929–1958.</mixed-citation><mixed-citation xml:lang="en">Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting // Journal of Machine Learning Research. 2014. Vol. 15. No. 1. P. 1929–1958.</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Hoerl A.E., Kennard R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems // Technometrics. 1970. Vol. 12. No. 1. P. 55–67. https://doi.org/10.1080/00401706.1970.10488634</mixed-citation><mixed-citation xml:lang="en">Hoerl A.E., Kennard R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems // Technometrics. 1970. Vol. 12. No. 1. P. 55–67. https://doi.org/10.1080/00401706.1970.10488634</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Pichai S., Hassabis D., Kavukcuoglu K. A new era of intelligence with Gemini 3 // Google. The Keyword. URL: https://blog.google/products-and-platforms/products/gemini/gemini-3/#note-from-ceo (date accessed: 02.02.2026).</mixed-citation><mixed-citation xml:lang="en">Pichai S., Hassabis D., Kavukcuoglu K. A new era of intelligence with Gemini 3 // Google. The Keyword. URL: https://blog.google/products-and-platforms/products/gemini/gemini-3/#note-from-ceo (date accessed: 02.02.2026).</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Elizarov A.M., Kirillovich A.V., Lipachev E.K., Nevzorova O.A. Digital Ecosystem OntoMath as an Approach to Building the Space of Mathematical Knowledge // Russian Digital Library Journal. 2023. Vol. 26. No. 2. P. 154–202. https://doi.org/10.26907/1562-5419-2023-26-2-154-202 (In Russ.)</mixed-citation><mixed-citation xml:lang="en">Elizarov A.M., Kirillovich A.V., Lipachev E.K., Nevzorova O.A. Digital Ecosystem OntoMath as an Approach to Building the Space of Mathematical Knowledge // Russian Digital Library Journal. 2023. Vol. 26. No. 2. P. 154–202. https://doi.org/10.26907/1562-5419-2023-26-2-154-202 (In Russ.)</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
