AgentTrove: how to use the 1.7M agent trace dataset in Python
AgentTrove is the largest open dataset of 1.7 million AI agent interaction traces in ShareGPT format. A new Python tutorial shows how to stream the data without

AgentTrove — the largest open-source dataset of AI agent interaction traces with 1.7 million examples in ShareGPT format. A new Python tutorial shows how to efficiently work with data for training your own agents.
What is AgentTrove
AgentTrove collects real trajectories of various AI agents into a single open resource. Each example demonstrates the complete sequence: how an agent reads instructions, parses the task, executes actions, processes results, and reflects on errors. This detailed level of information allows researchers to explore task-solving logic and understand what strategies modern systems employ. The dataset includes work from different types of agents — from simple rule-based systems to complex multi-step solvers. This diversity is important for a comprehensive understanding of how agents work and evolve. The ShareGPT format ensures compatibility with popular training tools, from Hugging Face to specialized LLM frameworks.
Key Features
- Data streaming — load data in chunks without needing to download the entire dataset into memory
- Turn normalization — bringing agent interactions to a unified standard format for consistent analysis
- Strategy and pattern analysis — built-in tools for extracting commands and exploring task-solving paths
- Filtering successful traces — selecting only examples with correct task solutions, filtering out failed attempts
- Export to SFT format — ready-made dataset for supervised fine-tuning language models without additional preparation
How to Use in Practice
The Python tutorial published alongside the dataset shows a step-by-step process for working with AgentTrove. The first step is to initialize data streaming, which allows working without full loading into memory. This is especially important when working with a dataset of this size, where full loading could require tens of gigabytes of RAM and unjustifiably slow down the start of analysis.
The next stage is turn normalization. Agents can interact with the system differently depending on the implementation, and bringing them to a unified format simplifies subsequent analysis and behavior comparison. Then commands are extracted: what actions the agent performed, in what order, how it responded to errors, when it changed strategy, what typical sequences appear frequently.
Trajectory analysis reveals deep patterns: which approaches work most often and lead to success, where typical failures occur, how the agent adapts to new conditions and unforeseen obstacles. This is especially useful for understanding failure modes — places where systems often get stuck. The final step is filtering successful examples and exporting into a clean SFT dataset for training your own models without noise and erroneous trajectories.
What It Means
AgentTrove significantly lowers the barrier to entry for developing your own AI agents. Instead of collecting examples from scratch, researchers and developers can now rely on 1.7 million ready-made trajectories from various domains. This will enable faster iteration when creating smarter, more reliable, and more efficient agent systems.
Хотите не читать про ИИ, а внедрить его?
«AI News» — это полезные новости из мира ИИ. Системно научиться работать с нейросетями и применять их в работе — в Hamidun Academy.