Regex из локальной LLM: опыт Bitrix24 без дообучения
Bitrix24 использует локальную LLM на Mac Mini для генерации Regex для парсинга логов. Вместо дообучения модели, используется скрипт, сохраняющий и применяющий с
AI-processed from Habr AI; edited by Hamidun News
In a modern world where data volumes are growing exponentially, efficient log analysis has become critical for maintaining IT infrastructure stability and security. Bitrix24 has found an innovative solution to this challenge by using a local language model (LLM) to automatically generate regular expressions (Regex). Instead of the traditional approach requiring costly and labor-intensive retraining of neural networks on proprietary data, Bitrix24 developed a system where the LLM generates Regex, while a script autonomously saves and applies these rules. This approach not only saves resources but also ensures data security, as all computations occur within the company's perimeter.
Traditionally, creating Regex for log parsing is a routine and labor-intensive task requiring deep knowledge of regular expression syntax and understanding of log structure. This process can take hundreds of hours of manual debugging, especially when working with large quantities of diverse log files. Using cloud APIs to generate Regex can simplify this task, but comes with risks associated with transferring confidential data to third-party services. Furthermore, cloud solutions may prove economically unfavorable when dealing with large volumes of processed data.
The architecture of the system developed by Bitrix24 includes a locally deployed LLM running on Mac Mini. The model receives a description of the log structure and a parsing task, after which it generates the corresponding regular expression. A script developed by Bitrix24 specialists automatically saves the generated Regex and uses it for log parsing. The key point is that the LLM is used out of the box, without additional retraining on Bitrix24-specific data. Instead, the emphasis is placed on optimizing the script that manages the Regex generation and application process.
The advantages of this approach are clear. First, resource savings: the absence of a need for neural network retraining significantly reduces costs for computational power and dataset maintenance. Second, enhanced security: all computations occur within the company, eliminating the risk of confidential data leakage. Third, flexibility and scalability: the system easily adapts to new log types and can be scaled to handle large data volumes.
The implementation of this system allowed Bitrix24 to significantly reduce the time spent on log analysis and improve the efficiency of IT specialists. Automatic Regex generation frees up resources for solving more complex tasks such as anomaly analysis and security threat detection. This case demonstrates that local LLMs can be an effective tool for solving practical tasks that do not require complex retraining.
This approach has broad prospects for other companies facing the need to analyze large volumes of data. It can be applied in various fields such as application performance monitoring, fraud detection, and user behavior analysis. It is important to note that the success of this approach largely depends on the quality of the script that manages the Regex generation and application process. Therefore, companies planning to implement such a system should pay special attention to the development and optimization of this script.
In conclusion, Bitrix24 has demonstrated an innovative approach to using local LLMs for automating Regex creation. This approach not only saves resources and enhances data security but also opens new opportunities for solving practical data analysis tasks. In the future, we can expect further development of this direction and the emergence of new tools and methods using local LLMs for automating various tasks.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.