Vol 25, No 2 (2022)
104-120 68
Abstract
Today, users of mobile applications in different areas leave a huge amount of digital footprint. The main types of digital footprints are text, photos, videos, audio, and current location. To assist the teacher in horizontal learning, a mobile application that collects all of the above types of digital footprint was developed as well as web application that analyzes it.
Today, users of mobile applications in different areas leave a huge amount of digital footprint. The main types of digital footprints are text, photos, videos, audio, and current location. To assist the teacher in horizontal learning, a mobile application that collects all of the above types of digital footprint was developed as well as web application that analyzes it.
121-136 85
Abstract
The article is devoted to automation of the software design stage. In the course of the study, the reasons for the high importance of this stage and the relevance of its automation were analyzed. The main stages of this stage were also considered and the existing systems that allow automating each of them were considered. In addition, an own solution was proposed within the framework of the problem of class structure refactoring based on the combinatorial optimization method. A solution method has been developed to improve the quality of the class hierarchy and tested on a real model.
The article is devoted to automation of the software design stage. In the course of the study, the reasons for the high importance of this stage and the relevance of its automation were analyzed. The main stages of this stage were also considered and the existing systems that allow automating each of them were considered. In addition, an own solution was proposed within the framework of the problem of class structure refactoring based on the combinatorial optimization method. A solution method has been developed to improve the quality of the class hierarchy and tested on a real model.
137-147 100
Abstract
The article is devoted to the creation of an effective solution for user segmentation. The article presents an analysis of existing user segmentation services, an analysis of approaches to user segmentation (ABCDx segmentation, demographic segmentation, segmentation based on a user journey map), an analysis of clustering algorithms (K-means, Mini-Batch K-means, DBSCAN, Agglomerative Clustering, Spectral Clustering). The study of these areas is aimed at creating a “flexible” segmentation solution that adapts to each user sample. Dispersion analysis (ANOVA test), analysis of clustering metrics is also used to assess the quality of user segmentation. With the help of these areas, an effective solution for user segmentation has been developed using advanced analytics and machine learning technology.
The article is devoted to the creation of an effective solution for user segmentation. The article presents an analysis of existing user segmentation services, an analysis of approaches to user segmentation (ABCDx segmentation, demographic segmentation, segmentation based on a user journey map), an analysis of clustering algorithms (K-means, Mini-Batch K-means, DBSCAN, Agglomerative Clustering, Spectral Clustering). The study of these areas is aimed at creating a “flexible” segmentation solution that adapts to each user sample. Dispersion analysis (ANOVA test), analysis of clustering metrics is also used to assess the quality of user segmentation. With the help of these areas, an effective solution for user segmentation has been developed using advanced analytics and machine learning technology.
148-158 81
Abstract
The paper describes new approaches to collecting data on scientific publications from open access systems with the subject of Earth Science. Based on the developed and adapted approaches, an archive of scientific publications (repository) and a set of programs for accessing scientific publications for collecting, searching, filtering, cataloging and managing publications and their metadata have been created. In order to improve the availability of publications and other related data on the websites of the SGM RAS, the Wiki – Geology of Russia system has been developed. This system is a thematic rubric in the direction of "Mineral deposits of Russia", with an additional topic "Mineralogy". All articles must have a link to the source of information from the archive of scientific publications and, optionally, additional links on similar topics. Wiki – Geology of Russia is the first step in creating a knowledge base on mineral deposits.
The paper describes new approaches to collecting data on scientific publications from open access systems with the subject of Earth Science. Based on the developed and adapted approaches, an archive of scientific publications (repository) and a set of programs for accessing scientific publications for collecting, searching, filtering, cataloging and managing publications and their metadata have been created. In order to improve the availability of publications and other related data on the websites of the SGM RAS, the Wiki – Geology of Russia system has been developed. This system is a thematic rubric in the direction of "Mineral deposits of Russia", with an additional topic "Mineralogy". All articles must have a link to the source of information from the archive of scientific publications and, optionally, additional links on similar topics. Wiki – Geology of Russia is the first step in creating a knowledge base on mineral deposits.
159-178 87
Abstract
Every year the size of the global big data market is growing. Analysing these data is essential for good decision-making. Big data technologies lead to a significant cost reduction with use of cloud services, distributed file systems, when there is a need to store large amounts of information. The quality of data analytics is dependent on the quality of the data themselves. This is especially important if the data has a retention policy and migrates from one source to another, increasing the risk of a data loss. Prevention of negative consequences from data migration is achieved through the process of data reconciliation – a comprehensive verification of large amounts of information in order to confirm their consistency.
This article discusses probabilistic data structures that can be used to solve the problem, and suggests an implementation – data integrity verification module using a Counting Bloom filter. This module is integrated into Apache Airflow to automate its invocation.
Every year the size of the global big data market is growing. Analysing these data is essential for good decision-making. Big data technologies lead to a significant cost reduction with use of cloud services, distributed file systems, when there is a need to store large amounts of information. The quality of data analytics is dependent on the quality of the data themselves. This is especially important if the data has a retention policy and migrates from one source to another, increasing the risk of a data loss. Prevention of negative consequences from data migration is achieved through the process of data reconciliation – a comprehensive verification of large amounts of information in order to confirm their consistency.
This article discusses probabilistic data structures that can be used to solve the problem, and suggests an implementation – data integrity verification module using a Counting Bloom filter. This module is integrated into Apache Airflow to automate its invocation.
177-196 111
Abstract
The growth in the number of IT products with machine-learning features is increasing the relevance of automating machine-learning processes. The use of MLOps techniques is aimed at providing training and efficient deployment of applications in a production environment by automating side infrastructure issues that are not directly related to model development.
In this paper, we review the components, principles, and approaches of MLOps and analyze existing platforms and solutions for building machine learning pipelines. In addition, we propose an approach to build a machine learning pipeline based on basic DevOps tools and open-source libraries.
The growth in the number of IT products with machine-learning features is increasing the relevance of automating machine-learning processes. The use of MLOps techniques is aimed at providing training and efficient deployment of applications in a production environment by automating side infrastructure issues that are not directly related to model development.
In this paper, we review the components, principles, and approaches of MLOps and analyze existing platforms and solutions for building machine learning pipelines. In addition, we propose an approach to build a machine learning pipeline based on basic DevOps tools and open-source libraries.
ISSN 1562-5419 (Online)















