<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2025-28-6-1282-1305</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-620</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Детекция галлюцинаций на основе внутренних состояний больших языковых моделей</article-title><trans-title-group xml:lang="en"><trans-title>Detection of Hallucinations Based on the Internal States of Large Language Models</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Айсин</surname><given-names>Тимур Рустемович</given-names></name><name name-style="western" xml:lang="en"><surname>Aisin</surname><given-names>Timur Rustemovich</given-names></name></name-alternatives><email xlink:type="simple">aysin.timur@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Шамардина</surname><given-names>Татьяна Вячеславовна</given-names></name><name name-style="western" xml:lang="en"><surname>Shamardina</surname><given-names>Tatiana Vyacheslavovna</given-names></name></name-alternatives><email xlink:type="simple">shamardina.tatiana@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Московский физико-технический институт</institution></aff><aff xml:lang="en"><institution>Moscow Institute of Physics and Technology</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>12</month><year>2025</year></pub-date><volume>28</volume><issue>6</issue><fpage>1282</fpage><lpage>1305</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Айсин Т.Р., Шамардина Т.В., 2025</copyright-statement><copyright-year>2025</copyright-year><copyright-holder xml:lang="ru">Айсин Т.Р., Шамардина Т.В.</copyright-holder><copyright-holder xml:lang="en">Aisin T.R., Shamardina  T.V.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/620">https://ellibs.elpub.ru/jour/article/view/620</self-uri><abstract><p>В последние годы большие языковые модели (Large Language Models, LLM) достигли значительных успехов в области обработки естественного языка и стали ключевым инструментом для решения широкого спектра прикладных и исследовательских задач. Однако с ростом их масштабов и возможностей все более острой становится проблема галлюцинаций – генерации ложной, недостоверной или несуществующей информации, представленной в достоверной форме. В связи с этим вопросы анализа природы галлюцинаций и разработки методов их выявления приобретают особую научную и практическую значимость.


В работе изучен феномен галлюцинаций в больших языковых моделях, рассмотрены их существующая классификация и возможные причины. На базе модели Flan-T5 также исследованы различия внутренних состоянии модели при генерации галлюцинаций и верных ответов. На основе этих расхождений представлены два способа детектирования галлюцинаций: с помощью карт внимания и скрытых состояний модели. Эти методы протестированы на данных из бенчмарков HaluEval и Shroom 2024 в задачах суммаризации, ответов на вопросы, перефразирования, машинного перевода и генерации определений. Кроме того, исследована переносимость обученных детекторов между различными типами галлюцинаций, что позволило оценить универсальность предложенных методов для различных типов задач.
</p></abstract><trans-abstract xml:lang="en"><p>In recent years, large language models (LLMs) have achieved substantial progress in natural language processing tasks and have become key instruments for addressing a wide range of applied and research problems. However, as their scale and capabilities grow, the issue of hallucinations — i.e., the generation of false, unreliable, or nonexistent information presented in a credible manner—has become increasingly acute. Consequently, analyzing the nature of hallucinations and developing methods for their detection has acquired both scientific and practical significance.


This study examines the phenomenon of hallucinations in large language models, reviews their existing classification, and investigates potential causes. Using the Flan-T5 model, we analyze differences in the model’s internal states when generating hallucinations versus correct responses. Based on these discrepancies, we propose two approaches for hallucination detection: one leveraging attention maps and the other utilizing the model’s hidden states. These methods are evaluated on data from HaluEval and Shroom 2024 benchmarks in tasks such as summarization, question answering, paraphrasing, machine translation, and definition generation. Additionally, we assess the transferability of the trained detectors across different hallucination types, in order to evaluate the robustness of the proposed methods.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>большие языковые модели</kwd><kwd>галлюцинации</kwd><kwd>детекция</kwd><kwd>Flan-T5</kwd><kwd>обработка естественного языка</kwd><kwd>карты внимания</kwd><kwd>внутренние состояния</kwd><kwd>HaluEval</kwd><kwd>Shroom</kwd></kwd-group><kwd-group xml:lang="en"><kwd>large language models</kwd><kwd>hallucinations</kwd><kwd>detection</kwd><kwd>Flan-T5</kwd><kwd>natural language processing</kwd><kwd>attention maps</kwd><kwd>hidden states</kwd><kwd>HaluEval</kwd><kwd>Shroom</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Vaswani A., Shazeer N., Parmar N. et al. Attention is all you need // Advances in Neural Information Processing Systems. 2017. Vol. 30.</mixed-citation><mixed-citation xml:lang="en">Vaswani A., Shazeer N., Parmar N. et al. Attention is all you need // Advances in Neural Information Processing Systems. 2017. Vol. 30.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Huang L., Yu W., Ma W. et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions // ACM Transactions on Information Systems. 2025. Vol. 43, No. 2. P. 1–55. https://doi.org/10.1145/3703155</mixed-citation><mixed-citation xml:lang="en">Huang L., Yu W., Ma W. et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions // ACM Transactions on Information Systems. 2025. Vol. 43, No. 2. P. 1–55. https://doi.org/10.1145/3703155</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Li J., Cheng X., Zhao W. X. et al. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 6449–6464. https://doi.org/10.18653/v1/2023.emnlp-main.397</mixed-citation><mixed-citation xml:lang="en">Li J., Cheng X., Zhao W. X. et al. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 6449–6464. https://doi.org/10.18653/v1/2023.emnlp-main.397</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Mickus T., Zosa E., Vázquez R. et al. SemEval-2024 Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes // International Workshop on Semantic Evaluation. 2024. https://doi.org/10.18653/v1/2024.semeval-1.273</mixed-citation><mixed-citation xml:lang="en">Mickus T., Zosa E., Vázquez R. et al. SemEval-2024 Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes // International Workshop on Semantic Evaluation. 2024. https://doi.org/10.18653/v1/2024.semeval-1.273</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Carlini N., Ippolito D., Jagielski M. et al. Quantifying Memorization Across Neural Language Models // The Eleventh International Conference on Learning Representations. 2023. https://doi.org/10.48550/arXiv.2202.07646</mixed-citation><mixed-citation xml:lang="en">Carlini N., Ippolito D., Jagielski M. et al. Quantifying Memorization Across Neural Language Models // The Eleventh International Conference on Learning Representations. 2023. https://doi.org/10.48550/arXiv.2202.07646</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Lin S., Hilton J., Evans Q. TruthfulQA: Measuring How Models Mimic Human Falsehoods // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022. Vol. 1. P. 3214–3252. https://doi.org/10.18653/v1/2022.acl-long.229</mixed-citation><mixed-citation xml:lang="en">Lin S., Hilton J., Evans Q. TruthfulQA: Measuring How Models Mimic Human Falsehoods // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022. Vol. 1. P. 3214–3252. https://doi.org/10.18653/v1/2022.acl-long.229</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Li D., Rawat A.S., Zaheer M. et al. Large Language Models with Controllable Working Memory // Findings of the Association for Computational Linguistics: ACL 2023. 2023. P. 1774–1793. https://doi.org/10.18653/v1/2023.findings-acl.112</mixed-citation><mixed-citation xml:lang="en">Li D., Rawat A.S., Zaheer M. et al. Large Language Models with Controllable Working Memory // Findings of the Association for Computational Linguistics: ACL 2023. 2023. P. 1774–1793. https://doi.org/10.18653/v1/2023.findings-acl.112</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Sharma M., Tong M., Korbak T. et al. Towards Understanding Sycophancy in Language Models // The Twelfth International Conference on Learning Representations. 2024. https://doi.org/10.48550/arXiv.2310.13548</mixed-citation><mixed-citation xml:lang="en">Sharma M., Tong M., Korbak T. et al. Towards Understanding Sycophancy in Language Models // The Twelfth International Conference on Learning Representations. 2024. https://doi.org/10.48550/arXiv.2310.13548</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Reinforcement Learning from Human Feedback: Progress and Challenges // YouTube. URL: https://www.youtube.com/watch?v=hhiLw5Q_UFg (дата обращения: 04.05.2025)</mixed-citation><mixed-citation xml:lang="en">Reinforcement Learning from Human Feedback: Progress and Challenges // YouTube. URL: https://www.youtube.com/watch?v=hhiLw5Q_UFg (дата обращения: 04.05.2025)</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Chuang Y.S., Xie Y., Luo H. et al. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models // ArXiv. 2023. Vol. abs/2309.03883. https://doi.org/10.48550/arXiv.2309.03883</mixed-citation><mixed-citation xml:lang="en">Chuang Y.S., Xie Y., Luo H. et al. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models // ArXiv. 2023. Vol. abs/2309.03883. https://doi.org/10.48550/arXiv.2309.03883</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Voita E., Talbot D., Moiseev F. et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 5797–5808. https://doi.org/10.18653/v1/P19-1580</mixed-citation><mixed-citation xml:lang="en">Voita E., Talbot D., Moiseev F. et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 5797–5808. https://doi.org/10.18653/v1/P19-1580</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Min S., Krishna K., Lyu X. et al. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 12076–12100. https://doi.org/10.18653/v1/2023.emnlp-main.741</mixed-citation><mixed-citation xml:lang="en">Min S., Krishna K., Lyu X. et al. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 12076–12100. https://doi.org/10.18653/v1/2023.emnlp-main.741</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Luo Z., Xie Q., Ananiadou S. ChatGPT as a Factual Inconsistency Evaluator for Text Summarization // ArXiv. 2023. Vol. abs/2303.15621. https://doi.org/10.48550/arXiv.2303.15621</mixed-citation><mixed-citation xml:lang="en">Luo Z., Xie Q., Ananiadou S. ChatGPT as a Factual Inconsistency Evaluator for Text Summarization // ArXiv. 2023. Vol. abs/2303.15621. https://doi.org/10.48550/arXiv.2303.15621</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Manakul P., Liusie A., Gales M.J. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 9004–9017. https://doi.org/10.18653/v1/2023.emnlp-main.557</mixed-citation><mixed-citation xml:lang="en">Manakul P., Liusie A., Gales M.J. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 9004–9017. https://doi.org/10.18653/v1/2023.emnlp-main.557</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Cohen R., Hamri M., Geva M. et al. LM vs LM: Detecting Factual Errors via Cross Examination // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 12621–12640. https://doi.org/10.18653/v1/2023.emnlp-main.778</mixed-citation><mixed-citation xml:lang="en">Cohen R., Hamri M., Geva M. et al. LM vs LM: Detecting Factual Errors via Cross Examination // Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. P. 12621–12640. https://doi.org/10.18653/v1/2023.emnlp-main.778</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Xiao Y., Wang W.Y. On Hallucination and Predictive Uncertainty in Conditional Language Generation // Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. P. 2734–2744. https://doi.org/10.18653/v1/2021.eacl-main.236</mixed-citation><mixed-citation xml:lang="en">Xiao Y., Wang W.Y. On Hallucination and Predictive Uncertainty in Conditional Language Generation // Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. P. 2734–2744. https://doi.org/10.18653/v1/2021.eacl-main.236</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Miao N., Teh Y.W., Rainforth T. SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning // The Twelfth International Conference on Learning Representations. 2024. https://doi.org/10.48550/arXiv.2308.00436</mixed-citation><mixed-citation xml:lang="en">Miao N., Teh Y.W., Rainforth T. SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning // The Twelfth International Conference on Learning Representations. 2024. https://doi.org/10.48550/arXiv.2308.00436</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Adlakha V., BehnamGhader P., Lu X.H. et al. Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering // Transactions of the Association for Computational Linguistics. 2024. Vol. 12. P. 681–699. https://doi.org/10.1162/tacl_a_00667</mixed-citation><mixed-citation xml:lang="en">Adlakha V., BehnamGhader P., Lu X.H. et al. Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering // Transactions of the Association for Computational Linguistics. 2024. Vol. 12. P. 681–699. https://doi.org/10.1162/tacl_a_00667</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Lin Chin-Yew. ROUGE: A Package for Automatic Evaluation of Summaries // Text Summarization Branches Out. 2004. P. 74-81. ISBN: 9781932432466</mixed-citation><mixed-citation xml:lang="en">Lin Chin-Yew. ROUGE: A Package for Automatic Evaluation of Summaries // Text Summarization Branches Out. 2004. P. 74-81. ISBN: 9781932432466</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Venkit P.N., Gautam S., Panchanadikar R. et al. Nationality Bias in Text Generation // Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. P. 116–122. https://doi.org/10.18653/v1/2023.eacl-main.9</mixed-citation><mixed-citation xml:lang="en">Venkit P.N., Gautam S., Panchanadikar R. et al. Nationality Bias in Text Generation // Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. P. 116–122. https://doi.org/10.18653/v1/2023.eacl-main.9</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Goodrich B., Rao V., Liu P.J. et al. Assessing The Factual Accuracy of Generated Text // Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019. P. 166–175. https://doi.org/10.1145/3292500.3330955</mixed-citation><mixed-citation xml:lang="en">Goodrich B., Rao V., Liu P.J. et al. Assessing The Factual Accuracy of Generated Text // Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2019. P. 166–175. https://doi.org/10.1145/3292500.3330955</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Laban P., Schnabel T., Bennett P.N. et al. SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization // Transactions of the Association for Computational Linguistics. 2022. Vol. 10. P. 163–177. https://doi.org/10.1162/tacl_a_00453</mixed-citation><mixed-citation xml:lang="en">Laban P., Schnabel T., Bennett P.N. et al. SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization // Transactions of the Association for Computational Linguistics. 2022. Vol. 10. P. 163–177. https://doi.org/10.1162/tacl_a_00453</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Xu W., Agrawal S., Briakou E. et al. Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection // Transactions of the Association for Computational Linguistics. 2023. Vol. 11. P. 546–564. https://doi.org/10.1162/tacl_a_00563</mixed-citation><mixed-citation xml:lang="en">Xu W., Agrawal S., Briakou E. et al. Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection // Transactions of the Association for Computational Linguistics. 2023. Vol. 11. P. 546–564. https://doi.org/10.1162/tacl_a_00563</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Zhang T., Qiu L., Guo Q. et al. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus // Conference on Empirical Methods in Natural Language Processing. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.58</mixed-citation><mixed-citation xml:lang="en">Zhang T., Qiu L., Guo Q. et al. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus // Conference on Empirical Methods in Natural Language Processing. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.58</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">Chuang Y.S., Qiu L., Hsieh C.Y. et al. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps // Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. P. 1419–1436. https://doi.org/10.18653/v1/2024.emnlp-main.84</mixed-citation><mixed-citation xml:lang="en">Chuang Y.S., Qiu L., Hsieh C.Y. et al. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps // Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. P. 1419–1436. https://doi.org/10.18653/v1/2024.emnlp-main.84</mixed-citation></citation-alternatives></ref><ref id="cit26"><label>26</label><citation-alternatives><mixed-citation xml:lang="ru">Yin Z., Sun Q., Guo Q. et al. Do Large Language Models Know What They Don't Know? // Annual Meeting of the Association for Computational Linguistics. 2023. https://doi.org/10.18653/v1/2023.findings-acl.551</mixed-citation><mixed-citation xml:lang="en">Yin Z., Sun Q., Guo Q. et al. Do Large Language Models Know What They Don't Know? // Annual Meeting of the Association for Computational Linguistics. 2023. https://doi.org/10.18653/v1/2023.findings-acl.551</mixed-citation></citation-alternatives></ref><ref id="cit27"><label>27</label><citation-alternatives><mixed-citation xml:lang="ru">Marks S., Tegmark M. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets // First Conference on Language Modeling. 2024. https://doi.org/10.48550/arXiv.2310.06824</mixed-citation><mixed-citation xml:lang="en">Marks S., Tegmark M. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets // First Conference on Language Modeling. 2024. https://doi.org/10.48550/arXiv.2310.06824</mixed-citation></citation-alternatives></ref><ref id="cit28"><label>28</label><citation-alternatives><mixed-citation xml:lang="ru">Su W., Wang C., Ai Q. et al. Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models // Annual Meeting of the Association for Computational Linguistics. 2024. https://doi.org/10.48550/arXiv.2403.06448</mixed-citation><mixed-citation xml:lang="en">Su W., Wang C., Ai Q. et al. Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models // Annual Meeting of the Association for Computational Linguistics. 2024. https://doi.org/10.48550/arXiv.2403.06448</mixed-citation></citation-alternatives></ref><ref id="cit29"><label>29</label><citation-alternatives><mixed-citation xml:lang="ru">Chung H.W., Hou L., Longpre S. et al. Scaling Instruction-Finetuned Language Models // Journal of Machine Learning Research. 2024. Vol. 25, No. 70. P. 1–53. https://doi.org/10.5555/3722577.3722647</mixed-citation><mixed-citation xml:lang="en">Chung H.W., Hou L., Longpre S. et al. Scaling Instruction-Finetuned Language Models // Journal of Machine Learning Research. 2024. Vol. 25, No. 70. P. 1–53. https://doi.org/10.5555/3722577.3722647</mixed-citation></citation-alternatives></ref><ref id="cit30"><label>30</label><citation-alternatives><mixed-citation xml:lang="ru">Hochreiter Sepp, Schmidhuber Jürgen. Long Short-Term Memory // Neural Computation. 1997. Vol. 9, No. 8. P. 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735</mixed-citation><mixed-citation xml:lang="en">Hochreiter Sepp, Schmidhuber Jürgen. Long Short-Term Memory // Neural Computation. 1997. Vol. 9, No. 8. P. 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735</mixed-citation></citation-alternatives></ref><ref id="cit31"><label>31</label><citation-alternatives><mixed-citation xml:lang="ru">PCA // Wikipedia. URL: https://en.wikipedia.org/wiki/Principal_component_analysis (дата обращения: 13.06.2025).</mixed-citation><mixed-citation xml:lang="en">PCA // Wikipedia. URL: https://en.wikipedia.org/wiki/Principal_component_analysis (дата обращения: 13.06.2025).</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
