Agentis Memory: Redis-Compatible Storage with Vector Search and Local Embeddings
Agentis Memory is a Redis-compatible storage system for shared memory of AI agents with built-in semantic search and local embeddings. The project operates…
AI-processed from Habr AI; edited by Hamidun News
Agentis Memory proposes a simple but important idea for the AI-agent market: a shared working memory that behaves like regular Redis. Instead of a separate vector database, an external embedding API, and custom SDKs, the project combines key-value storage, semantic search, and local embedding calculation in a single process. For teams building multi-agent systems, this is an attempt to solve one of the most painful problems — context exchange between agents without extra network layers and latency.
The problem emerged in a real production incident investigation scenario. When several specialized agents work in parallel studying logs, metrics, conversations, and incident history, each sees only its own fragment of the picture. One agent might find OOMKilled in the logs and actually trace the root cause, but the others continue building their own hypotheses: CPU spikes, a recent deployment, or any other correlation.
The synthesizer ends up collecting several conflicting hypotheses, many of which are just noise. Attempting to store such findings in a shared markdown file doesn't help: write conflicts start, there's no TTL, no structure, and no semantic search. For an agent system, this is already insufficient.
Survey of existing solutions revealed the same problem from another angle. Mem0 and Zep already position themselves as memory layers for AI agents, but they come with REST APIs, separate SDKs, vector storage, and external services for embeddings. Redis Stack is closer to the needed model because it maintains compatibility with Redis clients, but leaves vector computation outside the server.
For long-term RAG this is tolerable, but for working memory where one agent saves a fact and another must find it within milliseconds, such a scheme is too heavy. Each extra network hop affects both latency and reliability. The first engineering hypothesis was obvious: take Redis itself, fork it, and embed ONNX Runtime and a vector index inside.
In practice, this path quickly ran into complex work with C, native libraries, memory management, and instability under concurrent requests. After a failed prototype, the project was rewritten from scratch in Java 25 using GraalVM native-image. This resulted in a single native binary of about 150 MB with an already embedded embeddings model.
Inside it uses Java Vector API for SIMD acceleration of cosine similarity, Project Loom for virtual threads, ONNX Runtime for local inference of the all-MiniLM-L6-v2 model, and the jvector library for HNSW nearest neighbor search. From the outside, Agentis Memory behaves like a familiar Redis server. It supports over 90 standard commands, TTL, SCAN, and basic pub/sub, and can be accessed through regular clients like redis-py, Jedis, ioredis, or go-redis.
The key difference is four additional memory commands. MEMSAVE takes text, chunks it by sentences, computes 384-dimensional vectors, and indexes them asynchronously, usually within 5-10 milliseconds per chunk. MEMQUERY takes a natural language query and returns nearest records by cosine similarity.
MEMSTATUS shows whether the index is ready for a specific key, and MEMDEL deletes data simultaneously from the key-value layer and the vector index. For a developer, this looks like a minimal extension of an already familiar Redis model, not a separate platform with a new ecosystem. The performance story was also instructive.
The first Java version ran roughly twice as slow as Redis. After switching to GraalVM native-image and rewriting the hot path using Vector API, the situation reversed: string operations grew from roughly 60 thousand to 168 thousand ops/sec, putting the project at about 1.36x Redis level.
In mixed workload the result was around 1.40x. At pipeline depth 100, the system achieved 3.
19 million operations per second, or about 1.71x Redis, thanks to its multi-threaded architecture without a single-threaded event loop. But the trade-off remains: on p99 latency Redis is still ahead on strings — 3.
82 milliseconds versus 6.27 for Agentis Memory, and this is the price paid for garbage collection. Special emphasis is placed on privacy and cost.
Embeddings are computed locally via ONNX Runtime directly inside the process, without API keys, without calls to external services, and without sending logs, metrics, or service traffic to the cloud. For systems working with incidents and internal infrastructure, this is not a cosmetic improvement but an important architectural decision. Local inference takes about 2-5 milliseconds per chunk, costs no separate embedding bill, and removes dependency on someone else's uptime.
The more sensitive the data and the higher the access frequency, the more noticeable the benefits of this approach. At a broader level, Agentis Memory well illustrates how infrastructure around AI agents is changing. The market no longer has room for simply plugging in an LLM, tools, and an orchestrator.
The next competitive point is shared memory, context synchronization speed, and the system's ability to quickly discard false hypotheses. If a Redis-compatible model with local embeddings gains traction in real workloads, such solutions could become for agent systems what regular Redis became long ago for conventional backend developers: a fast coordination layer, cache, and shared working memory.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.