Stanford introduced OpenJarvis — a local AI agent stack with memory and learning
Stanford released OpenJarvis — an open-source framework for personal AI agents that run locally on a laptop or PC. The project includes not only model…
AI-processed from MarkTechPost; edited by Hamidun News
Researchers from Stanford have released OpenJarvis — an open-source framework for personal AI agents that operate entirely on the user's device. The project is conceived as a ready-made stack for local AI: from running models and orchestrating agents to memory, tools, benchmarks, and subsequent training on local data.
Why This Matters
Most personal AI systems to date look local only on the surface: the interface runs on a laptop, but the core reasoning goes to cloud APIs. For tasks involving files, notes, emails, and persistent user context, this means latency, recurring costs, and unnecessary transfer of sensitive data. OpenJarvis proposes a different model: local execution by default, with the cloud as an option only when it's truly necessary.
At Stanford, the release is connected to their own work on Intelligence Per Watt. According to the lab, local language models and local accelerators are already capable of correctly serving 88.7% of single-turn chat and reasoning requests at interactive response speeds, and the efficiency by the "intelligence per watt" metric has increased 5.3 times from 2023 to 2025. The idea behind OpenJarvis is that the hardware and models are nearly ready, but the market was lacking a unified software layer for such systems.
How the Stack Works
OpenJarvis is built around five primitives that can be replaced, tested, and optimized independently of each other. This approach is meant to eliminate the typical confusion in local AI setups, where inference, agent logic, tool handling, memory, and learning are intertwined in one difficult-to-reproduce project. As a result, developers can compare not the entire system as a whole, but a specific layer — the model, engine, memory, or agent behavior. This makes experiments and production deployment considerably simpler.
- Intelligence — a models layer with a unified catalog of local LLMs and abstraction over their selection.
- Engine — a runtime for execution via Ollama, vLLM, SGLang, llama.cpp, and other engines.
- Agents — agent roles, including Orchestrator for task decomposition and Operative for recurring scenarios.
- Tools & Memory — access to tools, local memory, semantic search, MCP, and agent-to-agent communication through A2A.
- Learning — an improvement loop that uses local traces for fine-tuning and optimization.
Special emphasis is placed on the system not being limited to chat. OpenJarvis can work with local search across notes and documents, connect tools like web search, calculator, file input/output and code interpretation, as well as communicate with external MCP servers. Because of this, the framework is positioned not as a wrapper around a single model, but as infrastructure for a personal agent with long-term memory and access to the user's real environment.
What's Already Available
From a practical standpoint, the project looks quite grounded. OpenJarvis has a CLI, Python SDK, browser interface, and desktop applications for macOS, Windows, and Linux. The `jarvis init` command determines available hardware and recommends an appropriate engine and model combination, `jarvis doctor` helps diagnose configuration, and `jarvis serve` raises an OpenAI-compatible API server on FastAPI so developers can connect existing clients and frontends with minimal changes. Basic scenarios, according to the documentation, can work without network at all.
Another strong point is measuring efficiency, not just response quality. The framework collects telemetry on energy, latency, FLOPs, and monetary cost of a request, supports profiling on NVIDIA, AMD, and Apple Silicon, and standardizes benchmarks through `jarvis bench`. At the same time, OpenJarvis preserves local traces of interactions: from prompt-completion pairs to sequences of agent actions and tool calls. On this basis, one can optimize not only model weights, but also prompts, agent logic, and the inference engine itself — for example, through quantization, DSPy, GEPA, SFT, DPO, or GRPO.
What This Means
OpenJarvis shows that local AI is shifting from experimental setups toward a full-fledged engineering stack. If Stanford's approach catches on, developers will get a standard foundation for personal agents that store data with the user, are cheaper to operate, and become more useful over time through training on their own local scenarios. For the market, this is another signal: part of everyday AI tasks will soon begin to migrate from the cloud to personal devices.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.