Measuring Uncertainty in Transformer Circuits with Effective Information Consistency
https://doi.org/10.26907/1562-5419-2025-28-5-1103-1119
Abstract
Mechanistic interpretability has identified functional subgraphs within large language models (LLMs), known as Transformer Circuits (TCs), that appear to implement specific algorithms. Yet we lack a formal, single-pass way to quantify when an active circuit is behaving coherently and thus likely trustworthy. Building on the author’s prior sheaf‑theoretic formulation of causal emergence (Krasnovsky, 2025), we specialize it to transformer circuits and introduce the single‑pass, dimensionless Effective‑Information Consistency Score (EICS). EICS combines (i) a normalized sheaf inconsistency computed from local Jacobians and activations, with (ii) a Gaussian EI proxy for circuit-level causal emergence derived from the same forward state. The construction is white-box, single-pass, and makes units explicit so that the score is dimensionless. We further provide practical guidance on score interpretation, computational overhead (with fast and exact modes), and a toy sanity-check analysis.
References
1. Olsson C., Elhage N., Nanda N., et al. In-context Learning and Induction Heads. 2022. arXiv: 2209.11895.
2. Anthropic. Circuit Tracing / Attribution Graphs: Methods & Applications: Transformer Circuits Team. 2025. Access mode: https://transformer-circuits.pub/2025/attribution-graphs/ (accessed: 2025-08-20).
3. Yao Y., Zhang N., Xi Z., Wang M., Xu Z., Deng S., and Chen H. Knowledge Circuits in Pretrained Transformers // Advances in Neural Information Processing Systems (NeurIPS). 2024. Vol. 37. P. 118571–118602.
4. Krasnovsky A.A. Sheaf-Theoretic Causal Emergence for Resilience Analysis in Distributed Systems. 2025. arXiv : 2503.14104.
5. Hansen J., Ghrist R. Toward a Spectral Theory of Cellular Sheaves // Journal of Applied and Computational Topology. 2019. Vol. 3, No. 4. P. 315–358.
6. Robinson M. Topological Signal Processing. Springer, 2014.
7. Rosas F.E., Mediano P.A.M., Jensen H.J., Seth A.K., Barrett A.B., Carhart-Harris R.L., and Bor D. Reconciling Emergences: An Information-Theoretic Approach to Identify Causal Emergence in Multivariate Data // PLOS Computational Biology. 2020. Vol. 16, No. 12. P. e1008289.
8. Tononi G., Sporns O. Measuring Information Integration // BMC Neuroscience. 2003. Vol. 4. P. 31.
9. Oizumi M., Albantakis L., Tononi G. From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0 // PLOS Computational Biology. 2014. Vol. 10, No. 5. P. e1003588.
10. Angelopoulos A.N., Bates S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. 2021. arXiv : 2107.07511.
11. Guo C., Pleiss G., Sun Y., and Weinberger K.Q. On Calibration of Modern Neural Networks // Proceedings of the 34th International Conference on Machine Learning (ICML). PMLR. 2017. P. 1321–1330.
12. Lakshminarayanan B., Pritzel A., Blundell C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles // Advances in Neural Information Processing Systems (NeurIPS). 2017. Vol. 30.
13. Bayesian Low-rank Adaptation for Large Language Models (Laplace-LoRA). 2023. ICLR 2024 version. arXiv : 2308.13111.
14. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models / Hase P., Bansal M., Kim B., and Ghandeharioun A. // Advances in Neural Information Processing Systems (NeurIPS). 2023. Vol. 36. P. 17643–17668.
15. Huang L., Yu W., Ma W., Zhong W., Feng Z., Wang H., Chen Q., Peng W., Feng X., Qin B., et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions // ACM Transactions on Information Systems. 2025. Vol. 43, No. 2. P. 1–55.
Review
For citations:
Krasnovsky A.A. Measuring Uncertainty in Transformer Circuits with Effective Information Consistency. Russian Digital Libraries Journal. 2025;28(5):1103-1119. (In Russ.) https://doi.org/10.26907/1562-5419-2025-28-5-1103-1119
JATS XML















