NeuReality hires former Google AI head to accelerate NR-NEXUS launch

NeuReality is bolstering NR-NEXUS's market entry by bringing in Shalini Agarwal, who previously led AI products at Google, as a strategic advisor. The startup is building an OS for AI inference that spans CPU, GPU, and network infrastructure, pooling disparate clusters into true "token factories." The trajectory is clear: AI capital is rapidly shifting from model training to industrial-scale inference.

Khamidun Zhemal

AI monitoring · TNW

Apr 30, 2026· 3 min

AI-processed from TNW; edited by Hamidun News

NeuReality hires former Google AI head to accelerate NR-NEXUS launch — Source: TNW. Collage: Hamidun News.

◐ Listen to article

NeuReality has appointed Shalini Agarwal as strategic advisor to accelerate the market launch of NR-NEXUS — its operating system for AI inference. For the Israeli startup, this is more than just workforce reinforcement: the company is attempting to carve out space between expensive GPU clusters and corporate customers who need a managed layer above fragmented infrastructure.

Why the advisor is needed

Agarwal is not joining in an operational role, but as senior strategic advisor. Previously, she led AI product initiatives at Google Cloud and Google Workspace, including the implementation of Gemini in Gmail, Docs, Slides, and Sheets. For NeuReality, this sends an important signal to the market: the startup needs not only a strong engineering stack, but also someone who knows how to translate complex infrastructure technology into a clear value proposition for major clients, partners, and hardware vendors.

"Corporate AI is entering a new phase," says Agarwal.

The meaning of her appointment lies in go-to-market, not in rewriting architecture.

NeuReality is already building the product with co-founder and CEO Moshe Tanach and president Hiren Majmudar, a former top manager at GlobalFoundries and Intel Capital. Now the company needs to prove that its orchestration layer is worth the effort of integration — especially in a world where many customers are deeply entrenched in the NVIDIA ecosystem and reluctant to add new infrastructure layers to their stack.

How NR-NEXUS works

The platform was presented on March 12, 2026, as a hardware-agnostic operating system for AI factories or token factories. The idea is to avoid locking model execution to a single hardware type: NR-NEXUS runs on top of CPU, GPU, and network infrastructure, and also supports more diverse configurations with different accelerators. This approach is needed by companies that have already assembled clusters from heterogeneous hardware and do not want to rebuild everything from scratch for each new model or API.

According to NeuReality's description, the system takes on orchestration of the full inference stack and helps distribute load across compute, memory, and network. The company specifically highlights prefill and decode stages, which can be more efficiently distributed across different resources. In practice, this should stabilize performance, maintain SLA under load, and extract more useful work from accelerators that often remain idle for part of the time in real-world deployment.

Unified management layer for CPU, GPU, and NIC
Support for mixed infrastructure without full architectural overhaul
Load routing between open-source and proprietary models and APIs
Increased accelerator utilization and more predictable SLAs
Reduced token generation cost with growing volumes

The software is already being used by beta customers, with full commercial launch expected later this year. The target audience for NR-NEXUS is quite clear: neocloud providers, large companies building their own inference capacity, and chipmakers who need a ready-made software layer on top of their hardware. For all three scenarios, NeuReality is selling neither a model nor a chip, but an infrastructure middleware that should simplify running production AI services and reduce time-to-market for new models.

Why this is timely

The bet is placed on the hottest segment of the market. According to Deloitte's estimates, inference already accounted for roughly half of all AI computing in 2025, and the share could grow to two-thirds in 2026. This also explains the surge in capital spending: Amazon is budgeting approximately $200 billion for 2026, and Google is budgeting between $175 billion and $185 billion. But even against the backdrop of such budgets, corporate customers face an old problem: expensive hardware is often loaded unevenly, and the stack is assembled from too many incompatible components. It is precisely into this gap between hardware and operations that NeuReality is trying to insert itself.

The company has raised approximately $70 million in investments, including a Series A round of $35 million at the end of 2022 and another $20 million in March 2024 with support from the European Innovation Council Fund. Competition is already intense: the inference optimization market is simultaneously contested by Modal Labs, Baseten, and Fireworks AI, each with their own bet on who will become the main management layer after the era of model training.

What this means

NeuReality's story shows where value in AI infrastructure is shifting: from model training to their everyday deployment. If the startup can prove that NR-NEXUS truly increases cluster utilization and reduces token cost without vendor lock-in, it has a chance to become a useful layer for enterprise customers who want to build AI services on already-purchased hardware. The winner here may not be the one with more GPUs, but the one who better manages inference in production.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation