Habr AI→ original

Open-source LLM for Lawyers: Reg.Cloud and Raft Experiment

Рег.облако и Raft провели эксперимент по использованию open-source LLM для анализа юридических документов. Статья рассматривает ограничения, инженерные решения

Open-source LLM for Lawyers: Reg.Cloud and Raft Experiment
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article3 min
0:00
—:—

Document automation in the legal field has come a long way, from regular expressions to modern neural networks. However, in practice, either processing quality proved insufficient for real business tasks, or the cost of implementing and maintaining the solution was prohibitively high. In search of an optimal solution, the Reg.cloud team awarded a grant to the Raft team to conduct an experiment using modern open-source LLMs on cloud servers with A100 GPUs. The goal of the experiment was to determine how well LLMs handle long legal documents and whether they can be applied to industrial extraction of business-critical data.

During the experiment, the Raft team encountered a number of limitations. First, LLMs, even the most modern ones, have constraints on context length. Legal documents are often very voluminous, and to process them effectively, it is necessary to split them into chunks or use context expansion techniques. Second, the accuracy of data extraction directly depends on the quality of model training and its ability to understand legal terminology. Models trained on general data may struggle when working with specific legal texts.

To solve these problems, the Raft team applied a series of engineering solutions. Chunking (splitting text into fragments) and summarization (compressing information) techniques were used to process long documents. The team also conducted fine-tuning of models on specialized legal datasets. Special attention was paid to selecting optimal model parameters and configuring the data extraction process.

The results of the experiment proved promising, but not without drawbacks. LLMs demonstrated a good ability to extract key information from legal documents, but the accuracy and completeness of extraction varied depending on the document type and task complexity. The best results were achieved when using models fine-tuned on specialized data. However, even in this case, manual verification of results was required to ensure high accuracy.

This experiment has important implications for the legal industry. It demonstrates that open-source LLMs can be a useful tool for automating the processing of legal documents, but they require careful tuning and adaptation to specific tasks. In the future, with technological advancement and the emergence of more powerful models, we can expect significant improvements in results and wider application of LLMs in legal practice.

In conclusion, the experiment by Reg.cloud and Raft has demonstrated the potential of open-source LLMs for automating work with legal documents. Although there are certain limitations and complexities, engineering solutions and model fine-tuning enable achieving acceptable results. Further research and development in this field will open new opportunities for improving efficiency and reducing costs in the legal sector.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…