References

ellibs

Электронные библиотеки

Russian Digital Libraries Journal

1562-5419

Казанский (Приволжский) федеральный университет

10.26907/1562-5419-2026-29-1-123-144

ellibs-724

Research Article

Статьи

Интеллектуальный сервис мультимодального нейросетевого мониторинга области наблюдения

Intelligent Multimodal Neural Network Monitoring Service for the Surveillance Area

Миннеахметов

Разиль Рустемович

Minneakhmetov

Razil Rustemovich

razil0071999@gmail.com

Казанский (Приволжский) федеральный университетKazan (Volga region) Federal University

2026

04032026

291123144

2026

Миннеахметов Р.Р.

Minneakhmetov R.R.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://ellibs.elpub.ru/jour/article/view/724

Представлен подход к разработке интеллектуального сервиса мультимодального мониторинга области наблюдения с использованием больших нейросетевых моделей. Предлагаемое решение способно анализировать разнородные данные: видеопотоки, сигналы датчиков окружающей среды (температура, влажность и пр.) и журналы событий – для получения целостной картины происходящего. В качестве основных инструментов задействованы крупные языковые и визуальные модели (например, LLaMA, MiniCPM‑V и др.), развернутые локально с помощью платформы Ollama, что обеспечивает автономную и безопасную обработку информации без необходимости передачи данных на удаленные сервера. Разработан прототип системы, работающий в офлайн-режиме и способный выявлять критические ситуации, аномальные отклонения от нормы и контекстно значимые события в наблюдаемой зоне. Описана методика формирования тестовых сценариев и проведения качественной оценки работы модели по метрикам F1-мера, Precision, Recall. Результаты экспериментов подтвердили применимость мультимодальных моделей для решения задач мониторинга: прототип успешно распознает сложные паттерны поведения и демонстрирует потенциал больших моделей в построении адаптивных и масштабируемых систем наблюдения.

The article presents an approach to the development of an intelligent multimodal monitoring service for the surveillance area using large neural network models. The proposed solution is capable of analyzing heterogeneous data – video streams, environmental sensor signals (temperature, humidity, etc.), and event logs – to obtain a complete picture of what is happening. The main tools used are large language and visual models (for example, LLaMA, MiniCPM‑V, etc.) deployed locally using the Ollama platform, which provides autonomous and secure information processing without the need to transfer data to the cloud. A prototype system has been developed that works offline and is capable of detecting critical situations, abnormal deviations from the norm and contextually significant events in the observed area. The method of forming test scenarios and conducting a qualitative assessment of the model's performance using the metrics F1-measure, Precision, Recall on a set of various situations is described. The experimental results confirm the applicability of multimodal models for monitoring tasks: the prototype successfully recognizes complex patterns of behavior and demonstrates the potential of large models in building adaptive and scalable surveillance systems.

интеллектуальный сервисмультимодальный мониторингOllamaбольшие языковые моделиотслеживание активностейвидеоаналитикаискусственный интеллект

intelligent servicemultimodal monitoringOllamaLarge Language Modelsactivity trackingvideo analyticsartificial intelligence

References1

Onsu M.A., Lohan P., Kantarci B., Syed A., Andrews M., Kennedy S. Leveraging Multimodal Large Language Models Assisted by Instance Segmentation for Intelligent Traffic Monitoring [Electronic resource] // arXiv. 2025. Available at: https://arxiv.org/abs/2502.11304 (accessed: 15.05.2025).

Ferrara E. Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling // Sensors. 2024. Vol. 24, No. 15. Article 5045.

Suh S., Rey V.F., Lukowicz P. Tasked: Transformer-Based Adversarial Learning for Human Activity Recognition Using Wearable Sensors // Knowledge-Based Systems. 2023. Vol. 260. Article 110143.

Nauchnyy servis v seti Internet: trudy XXVI Vserossiyskoy nauchnoy konferentsii (September 22–25, 2025, online). Moscow: Keldysh Institute of Applied Mathematics, 2025 (in press).

Nath N.D., Behzadan A.H., Paal S.G. Deep Learning for Site Safety: Real-Time Detection of Personal Protective Equipment // Automation in Construction. 2020. Vol. 112. Article 103085.

Gupta S. Deep Learning-Based Human Activity Recognition Using Wearable Sensor Data // International Journal of Information Management Data Insights. 2021. Vol. 1. Article 100046.

Uçar A., Karakoşe M., Kırımça N. Artificial Intelligence for Predictive Maintenance Applications: Key Components, Trustworthiness, and Future Trends // Applied Sciences. 2024. Vol. 14, No. 2. Article 898.

Wu Z., Zhao J., Shen H. Smart Home Automation Based on Human Activity Recognition: A Survey // Future Generation Computer Systems. 2023. Vol. 137. P. 41–57.

Han S., Yuan S., Trabelsi M. LogGPT: Log Anomaly Detection via GPT [Electronic resource] // arXiv. 2023. Available at: https://arxiv.org/pdf/2309.14482

(accessed: 15.05.2025).

Sharma R., Patel N. Deep Learning-Based Anomaly Detection in Surveillance Videos // Journal of Visual Communication and Image Representation. 2022. Vol. 86. Article 103624.

Özüağ S., Ertuğrul Ö. Enhanced Occupational Safety in Agricultural Machinery Factories: Artificial Intelligence-Driven Helmet Detection Using Transfer Learning and Majority Voting // Applied Sciences. 2024. Vol. 14. Article 11278. https://doi.org/10.3390/app142311278.

Li X., Chen Y., Hu L. Real-Time Workplace Activity Recognition Using Deep Learning Models // IEEE Transactions on Industrial Informatics. 2023. Vol. 19, No. 2. P. 1520–1532.

Wu Z., Zhao J., Shen H. Smart Home Automation Based on Human Activity Recognition: A Survey // Future Generation Computer Systems. 2023. Vol. 137. P. 41–57.

Ollama [Electronic resource]. Available at: https://ollama.com/ (accessed: 30.03.2025).

Ollama API Documentation [Electronic resource]. Available at: https://github.com/ollama/ollama/blob/main/docs/api.md (accessed: 30.03.2025).

Sahoo P., Singh A.K., Saha S., Jain V., Mondal S., Chadha A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications [Electronic resource] // arXiv. 2024. Available at: https://arxiv.org/pdf/2402.07927 (accessed: 15.05.2025).

Ollama Python Library [Electronic resource]. Available at: https://github.com/ollama/ollama-python (accessed: 30.03.2025)

ISO 8601-1:2019 Standard [Electronic resource]. Available at: https://www.iso.org/obp/ui/#iso:std:iso:8601:-1:ed-1:v1:en (accessed: 30.03.2025).

OpenAI ChatGPT-4o-mini [Electronic resource]. Available at: https://chatgpt.com/ (accessed: 30.03.2025).

Ollama Gemma3:12B Model [Electronic resource]. Available at: https://ollama.com/library/gemma3:12b (accessed: 30.03.2025).

Ollama LLaVA:13B Model [Electronic resource]. Available at: https://ollama.com/library/llava:13b (accessed: 30.03.2025).

Ollama Llama3.2-Vision:11B Model [Electronic resource]. Available at: https://ollama.com/library/llama3.2-vision (accessed: 30.03.2025).

Ollama MiniCPM-V:8B Model [Electronic resource]. Available at: https://ollama.com/library/minicpm-v (accessed: 30.03.2025).

Ollama Qwen2.5-VL:7B Model [Electronic resource]. Available at: https://ollama.com/library/qwen2.5vl (accessed: 16.01.2026).

Ollama Mistral-Small-3.2 Model [Electronic resource]. Available at: https://ollama.com/library/mistral-small3.2 (accessed: 16.01.2026).

Hand D.J., Christen P. F*: An Interpretable Transformation of the Measure // Journal of Classification. 2021. Vol. 38, No. 1. P. 3–17.

Scikit-learn F1-Score [Electronic resource]. Available at: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html (accessed: 30.03.2025).

The authors declare that there are no conflicts of interest present.