Hiding in Meaning: Semantic Encoding for Generative Text Steganography

Oleg Yurievich Rogov; Dmitrii Evgenievich Indenbom; Dmitrii Sergeevich Korzh; Darya Valeryaevna Pugacheva; Vsevolod Alexandrovich Voronov; Elena Viktorovna Tutubalina

doi:10.26907/1562-5419-2025-28-5-1165-1185

Hiding in Meaning: Semantic Encoding for Generative Text Steganography

Oleg Yurievich Rogov, Dmitrii Evgenievich Indenbom, Dmitrii Sergeevich Korzh, Darya Valeryaevna Pugacheva, Vsevolod Alexandrovich Voronov, Elena Viktorovna Tutubalina

https://doi.org/10.26907/1562-5419-2025-28-5-1165-1185

Full Text:

PDF (Rus)

Generate QR code

Abstract

We propose a novel framework for steganographic text generation that hides binary messages within semantically coherent natural language using latent-space conditioning of large language models (LLMs). Secret messages are first encoded into continuous vectors via a learned binary-to-latent mapping, which is used to guide text generation through prefix tuning. Unlike prior token-level or syntactic steganography, our method avoids explicit word manipulation and instead operates entirely within the latent semantic space, enabling more fluent and less detectable outputs. On the receiver side, the latent representation is recovered from the generated text and decoded back into the original message. As a key theoretical contribution, we provide a robustness guarantee: if the recovered latent vector lies within a bounded distance of the original, exact message reconstruction is ensured, with the bound determined by the decoder’s Lipschitz continuity and the minimum logit margin. This formal result offers a principled view of the reliability–capacity trade-off in latent steganographic systems. Empirical evaluation on both synthetic data and real-world domains such as Amazon reviews shows that our method achieves high message recovery accuracy (above 91%), strong text fluency and competitive capacity up to 6 bits per sentence element while maintaining resilience against neural steganalysis. These findings demonstrate that latent conditioned generation offers a secure and practical pathway for embedding information in modern LLMs.

Keywords

steganography, semantic encoding, language models, prefix tuning, knowledge graphs, natural language generation, latent conditioning, neural steganalysis

About the Authors

Oleg Yurievich Rogov

Artificial Intelligence Research Institute
Russian Federation

Dmitrii Evgenievich Indenbom

Moscow Institute of Physics and Technology
Russian Federation

Dmitrii Sergeevich Korzh

Artificial Intelligence Research Institute
Russian Federation

Darya Valeryaevna Pugacheva

Artificial Intelligence Research Institute
Russian Federation

Vsevolod Alexandrovich Voronov

Moscow Institute of Physics and Technology
Russian Federation

Elena Viktorovna Tutubalina

Artificial Intelligence Research Institute
Russian Federation

References

1. Karimov E., Varlamov A., Ivanov D., Korzh D., and Rogov O.Y. Novel. LossEnhanced Universal Adversarial Patches for Sustainable Speaker Privacy. — 2025. — 2505.19951.

2. Moraldo H.H. An Approach for Text Steganography Based on Markov Chains // ArXiv. 2014. Vol. abs/1409.0915.

3. Fang T., Jaggi M., Argyraki K. Generating steganographic text with LSTMs // arXiv preprint arXiv:1705.10742. 2017.

4. Yang Z.-L., Guo X.-Q., Chen Z.-M., Huang Y.-F., Zhang Y.-J. RNN-stega: Linguistic steganography based on recurrent neural networks // IEEE Transactions on Information Forensics and Security. 2018. Vol. 14, No. 5. P. 1280–1295.

5. Yang Z.-L., Zhang S.-Y., Hu Y.-T., Hu Z.-W., Huang Y.-F. VAE-Stega: linguistic steganography based on variational auto-encoder // IEEE Transactions on Information Forensics and Security. 2020. Vol. 16. P. 880–895.

6. Ziegler Z., Deng Y., Rush A. M. Neural Linguistic Steganography // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. P. 1210–1215.

7. Dai F.Z., Cai Z. Towards near-imperceptible steganographic text // arXiv preprint arXiv:1907.06679. 2019.

8. Zhang S., Yang Z., Yang J., Huang Y. Provably Secure Generative Linguistic Steganography// Findings of the AssociationforComputational Linguistics: ACLIJCNLP 2021. 2021. P. 3046–3055.

9. Ding J., Chen K., Wang Y., Zhao N., Zhang W., Yu N. Discop: Provably Secure Steganography in Practice Based on “Distribution Copies” // 2023 IEEE Symposium on Security and Privacy (SP) / IEEE Computer Society. 2023. P. 2238– 2255.

10. Borisov V., Seßler K., Leemann T., Pawelczyk M., Kasneci G. Languagemodels are realistic tabular data generators // arXiv preprint arXiv:2210.06280. 2022.

11. Chia Y.K., Bing L., Poria S., Si L. RelationPrompt: Leveraging prompts to generate synthetic data for zero-shot relation triplet extraction // arXiv preprint arXiv:2203.09101. 2022.

12. Schick T., Schütze H. Generating datasets with pretrained language models // arXiv preprint arXiv:2104.07540. 2021.

13. Meng Y., Huang J., Zhang Y., Han J. Generating training data with language models: Towards zero-shot language understanding // Advances in Neural Information Processing Systems. 2022. Vol. 35. P. 462–477.

14. Ye J., Gao J., Li Q., Xu H., Feng J., Wu Z., Yu T., Kong L. Zerogen: Efficient zero-shot learning via dataset generation // arXiv preprint arXiv:2202.07922. 2022.

15. Wang Y., Ma X., Chen Z., Luo Y., Yi J., Bailey J. Symmetric cross entropy for robust learning with noisy labels // Proceedings of the IEEE/CVF international conference on computer vision. 2019. P. 322–330.

16. Gao J., Pi R., Yong L., Xu H., Ye J., Wu Z., Zhang W., Liang X., Li Z., Kong L. Self-guided noise-free data generation for efficient zero-shot learning // International Conference on Learning Representations (ICLR 2023). 2023.

17. Chen D., Lee C., Lu Y., Rosati D., Yu Z. Mixture of Soft Prompts for Controllable Data Generation // arXiv preprint arXiv:2303.01580. 2023.

18. Yu Y., Zhuang Y., Zhang J., Meng Y., Ratner A., Krishna R., Shen J., Zhang C. Large language model as attributed training data generator: A tale of diversity and bias // arXiv preprint arXiv:2306.15895. 2023.

Review

For citations:

Rogov O.Yu., Indenbom D.E., Korzh D.S., Pugacheva D.V., Voronov V.A., Tutubalina E.V. Hiding in Meaning: Semantic Encoding for Generative Text Steganography. Russian Digital Libraries Journal. 2025;28(5):1165-1185. (In Russ.) https://doi.org/10.26907/1562-5419-2025-28-5-1165-1185

JATS XML

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1562-5419 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Russian Digital Libraries Journal

Hiding in Meaning: Semantic Encoding for Generative Text Steganography

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy