Latest publications

Google DeepMind Introduces DiffusionGemma for Fast Text Generation on NVIDIA
Google DeepMind has optimized the new DiffusionGemma model for NVIDIA platforms to accelerate real-time text generation in chats, copilots, and agents.

NVIDIA MCG Toolkit automates AI model documentation for new regulations
NVIDIA presents a tool for automating AI model documentation that helps teams comply with regulatory requirements such as the EU AI Act and create verifiable model cards without manual work.

NVIDIA Introduces DynoSim for Optimizing LLM Serving Parameters
NVIDIA introduced DynoSim, a tool for automatic optimization of LLM serving configuration through Pareto frontier simulation.

NVIDIA RTX Introduces DLSS 4.5 and Multilingual AI Characters for Games
NVIDIA released an RTX update with support for multilingual AI characters via ACE and new DLSS 4.5 for Unreal Engine, simplifying AI-powered game development.

NVIDIA RTX PRO 4500 Blackwell: Accelerating Genomics and Protein Modeling
NVIDIA has released the RTX PRO 4500 Blackwell graphics card to accelerate genomic computations and protein modeling—key components of precision medicine.

NVIDIA's New CompileIQ Unlocks Hidden GPU Core Potential Through Compiler Parameter Tuning
NVIDIA introduced CompileIQ — an AI tool that automatically selects optimal compiler parameters for maximum GPU core performance, finding speedups where manual optimization has been exhausted.

NVIDIA CUDA 13.3 Simplifies GPU Development with Tile Programming in C++
NVIDIA released CUDA 13.3 with Tile programming in C++, which automates optimization of low-level GPU memory operations for developers.

NVIDIA Blackwell Sets STAC-AI Record in Financial AI Trading
NVIDIA Blackwell architecture demonstrated superior performance in the STAC-AI test for financial AI, processing larger volumes of data faster than all competitors.

NVIDIA Added CUDA Tile for GPU Kernel Optimization in C++
NVIDIA introduced CUDA Tile, a technology built into C++ for developing high-performance GPU kernels based on a tile-oriented approach.

NVIDIA Dynamo Snapshot: Accelerating Model Launch on Kubernetes
NVIDIA introduced a tool to reduce the time to load inference models on Kubernetes from minutes to seconds — a solution to avoid GPU losses during traffic spikes.

StepFun Presents Step 3.7 Flash on NVIDIA GPU for Multimodal Work
StepFun launched Step 3.7 Flash, a multimodal AI model with 198 billion parameters that simultaneously processes text, images, videos, and documents on NVIDIA accelerators.

NVIDIA Helps Telecom Companies Deploy Sovereign AI Factories with Token-Metering
Telecom companies are building sovereign AI infrastructures on NVIDIA Cloud Partner architecture, using token-metering for controlled access—an approach to scalable, high-margin services for governments and enterprises.

NVIDIA GB200: Exascale Computing in a Rack through Intelligent Task Scheduling
NVIDIA demonstrated how to maximize GB200 NVL72 performance through Slurm with network topology awareness—results show exascale computing on a single rack.

NVIDIA Shows How to Track GPUs in Kubernetes Clusters
Most teams underutilize GPUs in Kubernetes clusters because they simply don't see who's using them, how much memory is consumed, and whether containers are hanging.

NVIDIA Showed How Multi-Agent Systems Find Signals in Financial Markets
Multi-agent AI systems help researchers automate the search for trading signals in market data by analyzing prices, economic indicators, and alternative sources to identify hidden patterns.

NVIDIA Unveiled Tool for Generating 3D Medical Images
NVIDIA presented the NV-Generate-CTMR framework for automatic synthesis of realistic 3D medical images, addressing data shortage in radiology and accelerating training of generalized AI models.

NVIDIA Vera Rubin: How Developers Will Scale Agentic AI Without Latency
NVIDIA introduced Vera Rubin—a platform for scaling agentic AI that combines the Vera Rubin NVL72 GPU and Groq 3 LPX accelerator to achieve 400 tokens per second on trillion-parameter models.

NVIDIA Shows the Difference Between Evaluating Models and Evaluating AI Agents
Model benchmarks and agent evaluation solve different tasks: the former tests language understanding, the latter—the real-world behavior of the system in action.

NVIDIA developed a skills verification system for managing AI agents
NVIDIA introduced an approach to verifying and managing skills — the instructions an AI agent uses. This allows organizations to scale autonomous systems safely.

NVIDIA Released AI-Q for Deep Research in Agent Frameworks
NVIDIA introduced AI-Q — a specialized component for delegating complex research to a separate backend. Supports enterprise data via MCP and works with Claude Code, Codex, and other agents.

How NVIDIA Recommends Adapting AI Agents for Specific Tasks
A publication from NVIDIA breaks down 9 customization techniques: a general-purpose model needs to be adapted for logistics, customer support, and code generation. Proper tuning reduces hallucinations and cost.