Preview

Russian Digital Libraries Journal

Advanced search

A Tool for Rapid Diagnostics of Memory in Neural Network Architectures of Language Models

https://doi.org/10.26907/1562-5419-2025-28-6-1346-1367

Abstract


Large Language Models (LLMs) have evolved from simple n-gram systems to modern universal architectures; however, a key limitation remains the quadratic complexity of the self-attention mechanism with respect to input sequence length. This significantly increases memory consumption and computational costs, and with the emergence of tasks requiring extremely long contexts, creates the need for new architectural solutions. Since evaluating a proposed architecture typically requires long and expensive full-scale training, it is necessary to develop a tool that allows for a rapid preliminary assessment of a model’s internal memory capacity.


This paper presents a method for quantitative evaluation of the internal memory of neural network architectures based on synthetic tests that do not require large data corpora. Internal memory is defined as the amount of information a model can reproduce without direct access to its original inputs.


To validate the approach, a software framework was developed and tested on the GPT-2 and Mamba architectures. The experiments employed copy, inversion, and associative retrieval tasks. Comparison of prediction accuracy, error distribution, and computational cost enables a fast assessment of the efficiency and potential of various LLM architectures.

About the Authors

Pavel Andreevich Gavrikov
Moscow Institute of Physics and Technology
Russian Federation


Azamat Komiljon Usmanov
Moscow Institute of Physics and Technology
Russian Federation


Dmitriy Revayev
Moscow Institute of Physics and Technology
Russian Federation


Sergey Nikolaevich Buzykanov
Moscow Institute of Physics and Technology
Russian Federation


References

1. Kaplan J., McCandlish S., Henighan T., et al. Scaling Laws for Neural Language Models // arXiv preprint arXiv:2001.08361. 2020. https://doi.org/10.48550/arXiv.2001.08361

2. Brown T., Mann B., Ryder N., et al. Language Models are Few‑Shot Learners // Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 1877-1901. https://doi.org/10.5555/3495724.3495883

3. Beltagy I., Peters M. E., Cohan A. Longformer: The Long‑Document Transformer // arXiv preprint arXiv:2004.05150. 2020. https://doi.org/10.48550/arXiv.2004.05150

4. Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. Language Models are Unsupervised Multitask Learners // OpenAI. 2019.

5. Common Crawl Foundation. Common Crawl dataset. https://commoncrawl.org

6. Gu A., Goel K., Ré C. Efficiently Modeling Long Sequences with Structured State Spaces // International Conference on Learning Representations (ICLR). 2022.

7. Gao L., Biderman S., Black S., et al. The Pile: An 800 GB Dataset of Diverse Text for Language Modeling // arXiv preprint arXiv:2101.00027. 2020. https://doi.org/10.48550/arXiv.2101.00027

8. Eldan R., Li Y. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? // arXiv preprint arXiv:2305.07759. 2023. https://doi.org/10.48550/arXiv.2305.07759

9. Dao T. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness // Advances in Neural Information Processing Systems (NeurIPS). 2022. Vol. 35. P. 16344-16359. https://doi.org/10.48550/arXiv.2205.14135

10. Gu A., Goel K., Dao T., et al. Mamba: Linear-Time Sequence Modeling with Selective State Spaces // International Conference on Learning Representations (ICLR). 2024.

11. Kwon W., Lee S., Li S., Zaharia M., Zhang H., Stoica I., Sheng Y., Crichton W., Xie S., Gonzalez J. Efficient Memory Management for Large Language Model Inference with KV-Caching // arXiv preprint arXiv:2309.06180. 2023. https://doi.org/10.48550/arXiv.2309.06180

12. Vaswani A., Shazeer N., Parmar N., et al. Attention Is All You Need // Advances in Neural Information Processing Systems (NIPS). 2017. Vol. 30. P. 5998–6008. https://doi.org/10.5555/3295222.3295349

13. Tay Y., Bahri D., Metzler D., et al. Long Range Arena: A Benchmark for Efficient Transformers // arXiv preprint arXiv:2011.04006. 2020. https://doi.org/10.48550/arXiv.2011.04006

14. Bulatov A., Kuratov Y., Burtsev M. Recurrent Memory Transformer // Advances in Neural Information Processing Systems. 2022. Vol. 35. P. 11079-11091. https://doi.org/10.48550/arXiv.2207.06881


Review

For citations:


Gavrikov P.A., Usmanov A.K., Revayev D., Buzykanov S.N. A Tool for Rapid Diagnostics of Memory in Neural Network Architectures of Language Models. Russian Digital Libraries Journal. 2025;28(6):1346-1367. (In Russ.) https://doi.org/10.26907/1562-5419-2025-28-6-1346-1367

Views: 29

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1562-5419 (Online)