<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-5-1103-1119</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-611</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Оценка неопределенности в трансформерных цепях на основе принципа согласованности эффективной информации</article-title><trans-title-group xml:lang="en"><trans-title>Measuring Uncertainty in Transformer Circuits with Effective Information Consistency</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Красновский</surname><given-names>Анатолий Анатольевич</given-names></name><name name-style="western" xml:lang="en"><surname>Krasnovsky</surname><given-names>Anatoly Anatolievich</given-names></name></name-alternatives><email xlink:type="simple">a.a.krasnovsky@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Университет Иннополис</institution></aff><aff xml:lang="en"><institution>Innopolis University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2025</year></pub-date><volume>28</volume><issue>5</issue><fpage>1103</fpage><lpage>1119</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Красновский А.А., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Красновский А.А.</copyright-holder><copyright-holder xml:lang="en">Krasnovsky A.A.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/611">https://ellibs.elpub.ru/jour/article/view/611</self-uri><abstract><p>Механистическая интерпретируемость позволяет выявлять функциональные подграфы в больших языковых моделях (LLM), известные как трансформерные цепи (Transformer Circuits, TC), которые реализуют конкретные алгоритмы. Однако отсутствует формальный способ, позволяющий за один проход количественно оценить, когда активная цепь ведет себя согласованно и, следовательно, ее состояние может быть признано корректным. Опираясь на ранее предложенную автором пучково‑теоретическую формализацию причинной эмерджентности (Krasnovsky, 2025), мы специализируем ее для трансформерных цепей и вводим безразмерную однопроходную оценку согласованности эффективной информации (Effective Information Consistency Score, EICS). EICS сочетает нормализованную несогласованность пучка, вычисляемую из локальных якобианов и активаций, с гауссовским прокси EI для причинной эмерджентности на уровне цепи, полученным из того же состояния прямого прохода. Такая конструкция является прозрачной (white‑box), однопроходной и делает единицы измерения явными, так что оценка безразмерна. Представлены практические рекомендации по интерпретации оценки, учету вычислительных затрат (с быстрыми и точными режимами) и анализ простейшего примера для проверки на адекватность.
</p></abstract><trans-abstract xml:lang="en"><p>Mechanistic interpretability has identified functional subgraphs within large language models (LLMs), known as Transformer Circuits (TCs), that appear to implement specific algorithms. Yet we lack a formal, single-pass way to quantify when an active circuit is behaving coherently and thus likely trustworthy. Building on the author’s prior sheaf‑theoretic formulation of causal emergence (Krasnovsky, 2025), we specialize it to transformer circuits and introduce the single‑pass, dimensionless Effective‑Information Consistency Score (EICS). EICS combines (i) a normalized sheaf inconsistency computed from local Jacobians and activations, with (ii) a Gaussian EI proxy for circuit-level causal emergence derived from the same forward state. The construction is white-box, single-pass, and makes units explicit so that the score is dimensionless. We further provide practical guidance on score interpretation, computational overhead (with fast and exact modes), and a toy sanity-check analysis.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>механистическая интерпретируемость</kwd><kwd>трансформерные цепи</kwd><kwd>теория пучков</kwd><kwd>причинная эмерджентность</kwd><kwd>количественная оценка неопределенности</kwd><kwd>большие языковые модели (LLM)</kwd></kwd-group><kwd-group xml:lang="en"><kwd>mechanistic interpretability</kwd><kwd>ransformer circuits</kwd><kwd>sheaf theory</kwd><kwd>causal emergence</kwd><kwd>uncertainty quantification</kwd><kwd>large language models (LLMs)</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Olsson C., Elhage N., Nanda N., et al. In-context Learning and Induction Heads. 2022. arXiv: 2209.11895.</mixed-citation><mixed-citation xml:lang="en">Olsson C., Elhage N., Nanda N., et al. In-context Learning and Induction Heads. 2022. arXiv: 2209.11895.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Anthropic. Circuit Tracing / Attribution Graphs: Methods &amp; Applications: Transformer Circuits Team. 2025. Access mode: https://transformer-circuits.pub/2025/attribution-graphs/ (accessed: 2025-08-20).</mixed-citation><mixed-citation xml:lang="en">Anthropic. Circuit Tracing / Attribution Graphs: Methods &amp; Applications: Transformer Circuits Team. 2025. Access mode: https://transformer-circuits.pub/2025/attribution-graphs/ (accessed: 2025-08-20).</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Yao Y., Zhang N., Xi Z., Wang M., Xu Z., Deng S., and Chen H. Knowledge Circuits in Pretrained Transformers // Advances in Neural Information Processing Systems (NeurIPS). 2024. Vol. 37. P. 118571–118602.</mixed-citation><mixed-citation xml:lang="en">Yao Y., Zhang N., Xi Z., Wang M., Xu Z., Deng S., and Chen H. Knowledge Circuits in Pretrained Transformers // Advances in Neural Information Processing Systems (NeurIPS). 2024. Vol. 37. P. 118571–118602.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Krasnovsky A.A. Sheaf-Theoretic Causal Emergence for Resilience Analysis in Distributed Systems. 2025. arXiv : 2503.14104.</mixed-citation><mixed-citation xml:lang="en">Krasnovsky A.A. Sheaf-Theoretic Causal Emergence for Resilience Analysis in Distributed Systems. 2025. arXiv : 2503.14104.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Hansen J., Ghrist R. Toward a Spectral Theory of Cellular Sheaves // Journal of Applied and Computational Topology. 2019. Vol. 3, No. 4. P. 315–358.</mixed-citation><mixed-citation xml:lang="en">Hansen J., Ghrist R. Toward a Spectral Theory of Cellular Sheaves // Journal of Applied and Computational Topology. 2019. Vol. 3, No. 4. P. 315–358.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Robinson M. Topological Signal Processing. Springer, 2014.</mixed-citation><mixed-citation xml:lang="en">Robinson M. Topological Signal Processing. Springer, 2014.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Rosas F.E., Mediano P.A.M., Jensen H.J., Seth A.K., Barrett A.B., Carhart-Harris R.L., and Bor D. Reconciling Emergences: An Information-Theoretic Approach to Identify Causal Emergence in Multivariate Data // PLOS Computational Biology. 2020. Vol. 16, No. 12. P. e1008289.</mixed-citation><mixed-citation xml:lang="en">Rosas F.E., Mediano P.A.M., Jensen H.J., Seth A.K., Barrett A.B., Carhart-Harris R.L., and Bor D. Reconciling Emergences: An Information-Theoretic Approach to Identify Causal Emergence in Multivariate Data // PLOS Computational Biology. 2020. Vol. 16, No. 12. P. e1008289.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Tononi G., Sporns O. Measuring Information Integration // BMC Neuroscience. 2003. Vol. 4. P. 31.</mixed-citation><mixed-citation xml:lang="en">Tononi G., Sporns O. Measuring Information Integration // BMC Neuroscience. 2003. Vol. 4. P. 31.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Oizumi M., Albantakis L., Tononi G. From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0 // PLOS Computational Biology. 2014. Vol. 10, No. 5. P. e1003588.</mixed-citation><mixed-citation xml:lang="en">Oizumi M., Albantakis L., Tononi G. From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0 // PLOS Computational Biology. 2014. Vol. 10, No. 5. P. e1003588.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Angelopoulos A.N., Bates S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. 2021. arXiv : 2107.07511.</mixed-citation><mixed-citation xml:lang="en">Angelopoulos A.N., Bates S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. 2021. arXiv : 2107.07511.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Guo C., Pleiss G., Sun Y., and Weinberger K.Q. On Calibration of Modern Neural Networks // Proceedings of the 34th International Conference on Machine Learning (ICML). PMLR. 2017. P. 1321–1330.</mixed-citation><mixed-citation xml:lang="en">Guo C., Pleiss G., Sun Y., and Weinberger K.Q. On Calibration of Modern Neural Networks // Proceedings of the 34th International Conference on Machine Learning (ICML). PMLR. 2017. P. 1321–1330.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Lakshminarayanan B., Pritzel A., Blundell C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles // Advances in Neural Information Processing Systems (NeurIPS). 2017. Vol. 30.</mixed-citation><mixed-citation xml:lang="en">Lakshminarayanan B., Pritzel A., Blundell C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles // Advances in Neural Information Processing Systems (NeurIPS). 2017. Vol. 30.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Bayesian Low-rank Adaptation for Large Language Models (Laplace-LoRA). 2023. ICLR 2024 version. arXiv : 2308.13111.</mixed-citation><mixed-citation xml:lang="en">Bayesian Low-rank Adaptation for Large Language Models (Laplace-LoRA). 2023. ICLR 2024 version. arXiv : 2308.13111.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models / Hase P., Bansal M., Kim B., and Ghandeharioun A. // Advances in Neural Information Processing Systems (NeurIPS). 2023. Vol. 36. P. 17643–17668.</mixed-citation><mixed-citation xml:lang="en">Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models / Hase P., Bansal M., Kim B., and Ghandeharioun A. // Advances in Neural Information Processing Systems (NeurIPS). 2023. Vol. 36. P. 17643–17668.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Huang L., Yu W., Ma W., Zhong W., Feng Z., Wang H., Chen Q., Peng W., Feng X., Qin B., et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions // ACM Transactions on Information Systems. 2025. Vol. 43, No. 2. P. 1–55.</mixed-citation><mixed-citation xml:lang="en">Huang L., Yu W., Ma W., Zhong W., Feng Z., Wang H., Chen Q., Peng W., Feng X., Qin B., et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions // ACM Transactions on Information Systems. 2025. Vol. 43, No. 2. P. 1–55.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
