<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.26907/1562-5419-2020-23-6-1172-1191</article-id><article-id custom-type="elpub" pub-id-type="custom">ellibs-255</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Классификация изображений с использованием обучения  с подкреплением</article-title><trans-title-group xml:lang="en"><trans-title>Image Classification Using Reinforcement Learning</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Елизаров</surname><given-names>А. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Elizarov</surname><given-names>A. A.</given-names></name></name-alternatives><email xlink:type="simple">artelizar@gmail.com</email></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Разинков</surname><given-names>Е. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Razinkov</surname><given-names>E. V.</given-names></name></name-alternatives><email xlink:type="simple">evgeny@razinkov.ai</email></contrib></contrib-group><pub-date pub-type="collection"><year>2020</year></pub-date><pub-date pub-type="epub"><day>28</day><month>12</month><year>2020</year></pub-date><volume>23</volume><issue>6</issue><fpage>1172</fpage><lpage>1191</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Елизаров А.А., Разинков Е.В., 2020</copyright-statement><copyright-year>2020</copyright-year><copyright-holder xml:lang="ru">Елизаров А.А., Разинков Е.В.</copyright-holder><copyright-holder xml:lang="en">Elizarov  A.A., Razinkov  E.V.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/255">https://ellibs.elpub.ru/jour/article/view/255</self-uri><abstract><p>В последнее время активно развивается такое направление машинного обучения, как обучение с подкреплением. Как следствие предпринимаются попытки использования обучения с подкреплением для решения задач компьютерного зрения, в частности для решения задачи классификации изображений. Задачи компьютерного зрения являются на сегодняшний день одними из наиболее актуальных задач искусственного интеллекта.
&#13;

В статье предложен метод классификации изображений в виде глубокой нейронной сети с использованием обучения с подкреплением. Идея разработанного метода сводится к решению задачи о контекстном многоруком бандите с помощью различных стратегий достижения компромисса между эксплуатацией и исследованием и алгоритмов обучения с подкреплением. Рассмотрены такие стратегии, как -жадная, -softmax, -decay-softmax и метод UCB1, и такие алгоритмы обучения с подкреплением, как DQN, REINFORCE и A2C. Проведен анализ влияния различных параметров на эффективность работы.
</p></abstract><trans-abstract xml:lang="en"><p>Recently, such a direction of machine learning as reinforcement learning has been actively developing. As a consequence, attempts are being made to use reinforcement learning for solving computer vision problems, in particular for solving the problem of image classification. The tasks of computer vision are currently one of the most urgent tasks of artificial intelligence.
&#13;

The article proposes a method for image classification in the form of a deep neural network using reinforcement learning. The idea of ​​the developed method comes down to solving the problem of a contextual multi-armed bandit using various strategies for achieving a compromise between exploitation and research and reinforcement learning algorithms. Strategies such as -greedy, -softmax, -decay-softmax, and the UCB1 method, and reinforcement learning algorithms such as DQN, REINFORCE, and A2C are considered. The analysis of the influence of various parameters on the efficiency of the method is carried out, and options for further development of the method are proposed.
</p></trans-abstract><kwd-group xml:lang="ru"><kwd>машинное обучение</kwd><kwd>классификация изображений</kwd><kwd>обучение с подкреплением</kwd><kwd>задача о контекстном многоруком бандите</kwd></kwd-group><kwd-group xml:lang="en"><kwd>machine learning</kwd><kwd>image classification</kwd><kwd>reinforcement learning</kwd><kwd>contextual multi-armed bandit problem</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Goodfellow I., Bengio Y., Courville A. Deep learning // C.: The MIT Press, 2016, URL: https://www.deeplearningbook.org/.</mixed-citation><mixed-citation xml:lang="en">Goodfellow I., Bengio Y., Courville A. Deep learning // C.: The MIT Press, 2016, URL: https://www.deeplearningbook.org/.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Krizhevsky A., Sutskever I., Hinton G. E. ImageNet Classification with Deep Convolutional Neural Networks // Advances in neural information processing systems, 2012. Vol. 25, No. 2. P. 1097–1105, DOI: 10.1145/3065386.</mixed-citation><mixed-citation xml:lang="en">Krizhevsky A., Sutskever I., Hinton G. E. ImageNet Classification with Deep Convolutional Neural Networks // Advances in neural information processing systems, 2012. Vol. 25, No. 2. P. 1097–1105, DOI: 10.1145/3065386.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Russakovsky O., Deng J., Su H. at all. ImageNet Large Scale Visual Recog-nition Challenge // International Journal of Computer Vision. 2015. Vol. 115, No. 3. P. 211–252, DOI: 10.1007/s11263-015-0816-y.</mixed-citation><mixed-citation xml:lang="en">Russakovsky O., Deng J., Su H. at all. ImageNet Large Scale Visual Recog-nition Challenge // International Journal of Computer Vision. 2015. Vol. 115, No. 3. P. 211–252, DOI: 10.1007/s11263-015-0816-y.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Sutton R. S., Barto A. G. Reinforcement learning: An introduction // C.: The MIT Press, 2018. URL: http://www.incompleteideas.net/book/RLbook2020.pdf/.</mixed-citation><mixed-citation xml:lang="en">Sutton R. S., Barto A. G. Reinforcement learning: An introduction // C.: The MIT Press, 2018. URL: http://www.incompleteideas.net/book/RLbook2020.pdf/.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Liu X., Xia T., Wang J. at all. Fully Convolutional Attention Networks for Fine-Grained Recognition // arXiv:1603.06765, 2017.</mixed-citation><mixed-citation xml:lang="en">Liu X., Xia T., Wang J. at all. Fully Convolutional Attention Networks for Fine-Grained Recognition // arXiv:1603.06765, 2017.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Li Z., Yang Y., Liu X. at all. Dynamic Computational Time for Visual Atten-tion // arXiv:1703.10332, 2017.</mixed-citation><mixed-citation xml:lang="en">Li Z., Yang Y., Liu X. at all. Dynamic Computational Time for Visual Atten-tion // arXiv:1703.10332, 2017.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recogni-tion // Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. P. 770–778, DOI: 10.1109/CVPR.2016.90.</mixed-citation><mixed-citation xml:lang="en">He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recogni-tion // Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. P. 770–778, DOI: 10.1109/CVPR.2016.90.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">PyTorch, 2016, URL: https://pytorch.org/.</mixed-citation><mixed-citation xml:lang="en">PyTorch, 2016, URL: https://pytorch.org/.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Google Colaboratory, 2017, URL: https://colab.research.google.com/.</mixed-citation><mixed-citation xml:lang="en">Google Colaboratory, 2017, URL: https://colab.research.google.com/.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">ImageNet Dataset, 2016, URL: http://image-net.org/.</mixed-citation><mixed-citation xml:lang="en">ImageNet Dataset, 2016, URL: http://image-net.org/.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Fine-Grained Image Classification, 2019, URL: https://paperswithcode.com/task/fine-grained-image-classification/.</mixed-citation><mixed-citation xml:lang="en">Fine-Grained Image Classification, 2019, URL: https://paperswithcode.com/task/fine-grained-image-classification/.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Girshick R. Fast R-CNN // Proceedings of the IEEE International Confer-ence on Computer Vision, 2015. P. 1440–1448, DOI: 10.1109/ICCV.2015.169.</mixed-citation><mixed-citation xml:lang="en">Girshick R. Fast R-CNN // Proceedings of the IEEE International Confer-ence on Computer Vision, 2015. P. 1440–1448, DOI: 10.1109/ICCV.2015.169.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Mnih V., Kavukcuoglu K., Silver D. at all. Playing Atari with Deep Rein-forcement Learning // arXiv:1312.5602, 2013.</mixed-citation><mixed-citation xml:lang="en">Mnih V., Kavukcuoglu K., Silver D. at all. Playing Atari with Deep Rein-forcement Learning // arXiv:1312.5602, 2013.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Abdolmaleki A., Springenberg J. T., Degrave J. at all. Relative Entropy Regularized Policy Iteration // arXiv:1812.02256, 2018.</mixed-citation><mixed-citation xml:lang="en">Abdolmaleki A., Springenberg J. T., Degrave J. at all. Relative Entropy Regularized Policy Iteration // arXiv:1812.02256, 2018.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Auer P., Cesa-Bianchi N., Fischer P. Finite-time Analysis of the Multiarmed Bandit Problem // Machine Learning, 2002. Vol. 47, No. 2-3. P. 235–256, DOI: 10.1023/A:1013689704352.</mixed-citation><mixed-citation xml:lang="en">Auer P., Cesa-Bianchi N., Fischer P. Finite-time Analysis of the Multiarmed Bandit Problem // Machine Learning, 2002. Vol. 47, No. 2-3. P. 235–256, DOI: 10.1023/A:1013689704352.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
