<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">ellibs</journal-id><journal-title-group><journal-title xml:lang="ru">Электронные библиотеки</journal-title><trans-title-group xml:lang="en"><trans-title>Russian Digital Libraries Journal</trans-title></trans-title-group></journal-title-group><issn pub-type="epub">1562-5419</issn><publisher><publisher-name>Казанский (Приволжский) федеральный университет</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">ellibs-52</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Статьи</subject></subj-group></article-categories><title-group><article-title>Об одном методе детектирования искусственных и ненаучных текстов в обширной коллекции документов</article-title><trans-title-group xml:lang="en"><trans-title>A method for detecting artificial and non-scientific texts in the collection of documents</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Бахтеев</surname><given-names>О. Ю.</given-names></name></name-alternatives><email xlink:type="simple">bakhteev@ap-team.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Кузнецова</surname><given-names>М. В.</given-names></name></name-alternatives><email xlink:type="simple">kuznetsova@ap-team.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Романов</surname><given-names>А. В.</given-names></name></name-alternatives><email xlink:type="simple">alexey.romanov@phystech.edu</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Чехович</surname><given-names>Ю. В.</given-names></name></name-alternatives><email xlink:type="simple">chehovich@antiplagiat.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff xml:lang="ru" id="aff-1"><institution>Компания «Антиплагиат» (115093</institution><country>Russian Federation</country></aff><pub-date pub-type="collection"><year>2017</year></pub-date><pub-date pub-type="epub"><day>28</day><month>10</month><year>2017</year></pub-date><volume>20</volume><issue>5</issue><fpage>298</fpage><lpage>304</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Бахтеев О.Ю., Кузнецова М.В., Романов А.В., Чехович Ю.В., 2017</copyright-statement><copyright-year>2017</copyright-year><copyright-holder xml:lang="ru">Бахтеев О.Ю., Кузнецова М.В., Романов А.В., Чехович Ю.В.</copyright-holder><copyright-holder xml:lang="en">Бахтеев О.Ю., Кузнецова М.В., Романов А.В., Чехович Ю.В.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://ellibs.elpub.ru/jour/article/view/52">https://ellibs.elpub.ru/jour/article/view/52</self-uri><abstract><p>Работа посвящена описанию метода детектирования искусственных и ненаучных текстов в коллекции научных статей. Предлагаемый метод основан на лексическом и морфологическом анализе проверяемого документа, позволяющем оценить вероятность его принадлежности к классу научных документов. Эксперименты подтверждают возможность практического применения метода.</p></abstract><trans-abstract xml:lang="en"><p>In this paper, we propose a method of machine-generated and non-scientific text detection in a collection of scientific papers. The method is based on lexical and morphological analysis of the document examined with the help of language modeling. This technique enables estimation of probability that the text belongs to the class of scientific documents. Experimental evidence shows feasibility of the approach.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>обработка естественного языка</kwd><kwd>классификация документов</kwd><kwd>анализ текстов</kwd><kwd>статистические языковые модели</kwd><kwd>детектирование искусственных текстов</kwd></kwd-group><kwd-group xml:lang="en"><kwd>natural language processing</kwd><kwd>document classification</kwd><kwd>text mining</kwd><kwd>statistical language models</kwd><kwd>machine-generated text detection</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Arase Y., Zhou M. Machine Translation Detection from Monolingual Web-Text // ACL (1). 2013. P. 1597–1607.</mixed-citation><mixed-citation xml:lang="en">Arase Y., Zhou M. Machine Translation Detection from Monolingual Web-Text // ACL (1). 2013. P. 1597–1607.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Labbé C., Labbé D. Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? //Scientometrics. 2013. V. 94, No 1. P. 379–396.</mixed-citation><mixed-citation xml:lang="en">Labbé C., Labbé D. Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? //Scientometrics. 2013. V. 94, No 1. P. 379–396.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Van Noorden R. Publishers withdraw more than 120 gibberish papers //Nature. 2014. V. 24.</mixed-citation><mixed-citation xml:lang="en">Van Noorden R. Publishers withdraw more than 120 gibberish papers //Nature. 2014. V. 24.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Гречников Е. А. и др. Поиск неестественных текстов // Тр. XI Всероссийской научной конференции «Электронные библиотеки: перспективные методы и технологии, электронные коллекции». Петрозаводск, 2009. С. 306–308.</mixed-citation><mixed-citation xml:lang="en">Гречников Е. А. и др. Поиск неестественных текстов // Тр. XI Всероссийской научной конференции «Электронные библиотеки: перспективные методы и технологии, электронные коллекции». Петрозаводск, 2009. С. 306–308.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
