<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2023-26-4-437-455</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-382</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Нейронная сеть для генерации изображений на основе текста песен с применением моделей OpenAI и CLIP</article-title><trans-title-group xml:lang="en"><trans-title>Neural Network for Generating Images Based on Song Lyrics using OpenAI and CLIP Models</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Давлетгареева</surname><given-names>А. Р.</given-names></name><name name-style="western" xml:lang="en"><surname>Davletgareeva</surname><given-names>A. R.</given-names></name></name-alternatives><email xlink:type="simple">alsudavletgareeva@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Едкова</surname><given-names>К. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Edkova</surname><given-names>K. A.</given-names></name></name-alternatives><email xlink:type="simple">ksushka.e21@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Казанский (Приволжский) Федеральный университет</institution></aff><aff xml:lang="en"><institution>Kazan (Volga region) Federal University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2023</year></pub-date><pub-date pub-type="epub"><day>28</day><month>08</month><year>2023</year></pub-date><volume>26</volume><issue>4</issue><elocation-id>437–455</elocation-id><permissions><copyright-statement>Copyright &amp;#x00A9; Давлетгареева А.Р., Едкова К.А., 2023</copyright-statement><copyright-year>2023</copyright-year><copyright-holder xml:lang="ru">Давлетгареева А.Р., Едкова К.А.</copyright-holder><copyright-holder xml:lang="en">Davletgareeva A.R., Edkova K.A.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/382">https://ellibs.elpub.ru/jour/article/view/382</self-uri><abstract><p>Исследована эффективность моделей ImageNet diffusion model и CLIP для генерации изображений по текстовому описанию. С использованием различных текстовых вводов на разных параметрах проведены два эксперимента для определения лучших параметров при генерации изображений на основе текстового описания. Результаты показали, что, хотя ImageNet хорошо справляется с созданием изображений, CLIP лучше обеспечивает соединение текстовых подсказок с релевантными изображениями. Полученные результаты характеризуют высокий потенциал объединения названных моделей для создания высококачественных и контекстно релевантных изображений на основе текстового описания.
</p></abstract><trans-abstract xml:lang="en"><p>The effectiveness of the ImageNet diffusion model and CLIP models for image generation based on textual descriptions was investigated. Two experiments were conducted using various textual inputs and different parameters to determine the optimal settings for generating images from text descriptions. The results showed that while ImageNet performed well in generating images, CLIP demonstrated better alignment between textual prompts and relevant images. The obtained results highlight the high potential of combining these mentioned models for creating high-quality and contextually relevant images based on textual descriptions.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>генерация изображений</kwd><kwd>глубокое обучение</kwd><kwd>нейронные сети</kwd><kwd>обработка естественного языка</kwd></kwd-group><kwd-group xml:lang="en"><kwd>ImageNet diffusion model</kwd><kwd>CLIP</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Elasri M., Elharrouss O., Al-Maadeed S., Tairi H. Image Generation: A Review // Neural Processing Letters. 2022. Vol. 54. No. 5. P. 4609–4646.</mixed-citation><mixed-citation xml:lang="en">Elasri M., Elharrouss O., Al-Maadeed S., Tairi H. Image Generation: A Review // Neural Processing Letters. 2022. Vol. 54. No. 5. P. 4609–4646.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Zhang H., Song H., Li S., Zhou M., Song D. A survey of controllable text generation using transformer-based pre-trained language models // arXiv preprint arXiv:2201.05337. 2022</mixed-citation><mixed-citation xml:lang="en">Zhang H., Song H., Li S., Zhou M., Song D. A survey of controllable text generation using transformer-based pre-trained language models // arXiv preprint arXiv:2201.05337. 2022</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Основы генеративно-состязательных сетей. URL: https://habr.com/ru/articles/726254/</mixed-citation><mixed-citation xml:lang="en">Основы генеративно-состязательных сетей. URL: https://habr.com/ru/articles/726254/</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell. A, Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D.M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D. Language models are few-shot learners // Advances in neural information processing systems. 2020. Vol. 33. P. 1877–1901.</mixed-citation><mixed-citation xml:lang="en">Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell. A, Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D.M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D. Language models are few-shot learners // Advances in neural information processing systems. 2020. Vol. 33. P. 1877–1901.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">DALL⋅E 2. URL:https://openai.com/product/dall-e-2.</mixed-citation><mixed-citation xml:lang="en">DALL⋅E 2. URL:https://openai.com/product/dall-e-2.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">How AI is Transforming Text-to-Image Generation. URL: https://nesesho.com/index.php/2023/04/12/how-ai-is-transforming-text-to- image-generation/</mixed-citation><mixed-citation xml:lang="en">How AI is Transforming Text-to-Image Generation. URL: https://nesesho.com/index.php/2023/04/12/how-ai-is-transforming-text-to- image-generation/</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">OpenAI⋅GitHub. URL: https://github.com/openai.</mixed-citation><mixed-citation xml:lang="en">OpenAI⋅GitHub. URL: https://github.com/openai.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.C. Improved training of wasserstein GANs // Advances in neural information processing systems. 2017. Vol. 30. P. 5767–5777.</mixed-citation><mixed-citation xml:lang="en">Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., Courville A.C. Improved training of wasserstein GANs // Advances in neural information processing systems. 2017. Vol. 30. P. 5767–5777.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Indolia S., Goswami A.K., Mishra S.P., Asopa P. Conceptual understanding of convolutional neural network-a deep learning approach // Procedia computer science. 2018. Vol. 132. P. 679–688.</mixed-citation><mixed-citation xml:lang="en">Indolia S., Goswami A.K., Mishra S.P., Asopa P. Conceptual understanding of convolutional neural network-a deep learning approach // Procedia computer science. 2018. Vol. 132. P. 679–688.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Laudani A., Lozito G.M., Fulginei F.R., Salvini A. On training efficiency and computational costs of a feed forward neural network: a review // Computational intelligence and neuroscience. 2015. P. 83–83.</mixed-citation><mixed-citation xml:lang="en">Laudani A., Lozito G.M., Fulginei F.R., Salvini A. On training efficiency and computational costs of a feed forward neural network: a review // Computational intelligence and neuroscience. 2015. P. 83–83.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">CLIP. URL: https://github.com/openai/CLIP.</mixed-citation><mixed-citation xml:lang="en">CLIP. URL: https://github.com/openai/CLIP.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Dhariwal P., Nichol A. Diffusion models beat gans on image synthesis // Advances in Neural Information Processing Systems. 2021. Vol. 34. P. 8780–8794.</mixed-citation><mixed-citation xml:lang="en">Dhariwal P., Nichol A. Diffusion models beat gans on image synthesis // Advances in Neural Information Processing Systems. 2021. Vol. 34. P. 8780–8794.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Kim G., Kwon T., Ye J.C. Diffusionclip: Text-guided diffusion models for robust image manipulation // In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. P. 2426–2435.</mixed-citation><mixed-citation xml:lang="en">Kim G., Kwon T., Ye J.C. Diffusionclip: Text-guided diffusion models for robust image manipulation // In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. P. 2426–2435.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
