HKUDS Detailed OpenSpace — Self-Evolving Skill Engine for AI Agents
HKUDS released a detailed tutorial on OpenSpace — an engine that enables AI agents to learn from completed tasks and reuse skills. The breakdown covers cold…
AI-processed from MarkTechPost; edited by Hamidun News
HKUDS released a detailed breakdown of OpenSpace — an open-source skill engine that enables AI agents not to start each task from scratch, but to accumulate working templates and reuse them. The material shows the full cycle: from setting up an OpenAI model to comparing "cold" and "warm" starts, where skill reuse reduces token consumption and improves response quality.
From cold start
OpenSpace is deployed from a GitHub repository and in the tutorial is configured for OpenAI models, including gpt-4o-mini. Next, the agent is given its first task without any pre-built skill library. The system executes it like a regular LLM agent, but in parallel captures successful steps, workarounds, and working instructions that can be reused later.
Skills live not only in the dialogue context: they are saved as files following the SKILL.md convention and recorded in a SQLite database, so they can be viewed, versioned, and analyzed separately from the actual run. After this, the authors launch a similar task again and show the difference between cold start and warm start.
Here OpenSpace no longer reasons from scratch but selects appropriate skills through hybrid search on BM25 and embeddings. In the demo, they separately create three basic skills manually: data validation, report generation with fallback scenarios, and an error recovery mechanism. This is needed to seed useful patterns in advance and then let the evolutionary engine grow them on real tasks.
How skills grow
The key idea of OpenSpace is that a skill here is considered not a static prompt but a living entity. After each execution, the system analyzes what worked, where failures occurred, which tools degraded, and what can be improved in the next version. Therefore, the engine not only accumulates successful scenarios but also knows how to fix them, specialize them for new tasks, and extract new skills from individual successful runs.
- FIX — fixes a broken or outdated skill without changing its role.
- DERIVED — creates a derived version for a narrower scenario or new class of tasks.
- CAPTURED — extracts a new reusable pattern directly from successful execution.
- BM25 + embeddings — helps quickly find the most relevant skill for a task description. * open-space.cloud — provides a shared catalog where skills can be searched, downloaded, uploaded, and shared between teams. An important part of the architecture is the collective layer. Through a cloud community, agents can exchange already-evolved skills, view version history, and build shared repositories for teams. The article presents this as a transition from a single assistant to a network of agents that learn not just from their own mistakes. If one agent finds a reliable workaround for PDF generation, table parsing, or web research, another can take that skill and not repeat the same cycle of trials and failures.
Token economics OpenSpace The strongest part of the material is not the setup but the numbers.
The OpenSpace repository provides a GDPVal benchmark on a selection of 50 professional tasks across six categories: documents, compliance forms, media, engineering, spreadsheets, and strategy. For a fair comparison, OpenSpace was compared with a basic ClawWork agent on the same Qwen 3.5-Plus backbone model, so the difference is explained precisely by skill accumulation rather than model replacement.
The result — 4.2x higher economic return, 70.8% average quality, and 45.
9% fewer tokens on repeat task runs. The breakdown by category shows where self-evolution delivers the most. In documents and correspondence, warm run cut token consumption by 56%, in forms and compliance tasks — by 51%, in media — by 46%, in spreadsheets — by 37%.
At the same time, the authors emphasize that of the 165 evolved skills, the majority belong not to narrow domain expertise but to execution reliability: file format handling, error recovery, document generation, and quality checks. That is, the main benefit comes not from "domain knowledge" but from the fact that the agent stops breaking at typical technical points.
What this means
OpenSpace shows well where agent frameworks are heading: from one-off prompts to persistent working memory, where each completed task makes the system cheaper and more robust. For product teams, this is a signal that the next wave of efficiency will come not only from new models but also from infrastructure of reusable skills around them.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.