DGX Spark with Qwen3: the NVIDIA test that didn't tell the whole story
NVIDIA's new DGX Spark (GB10) system with 128 GB of unified memory promises to solve the challenges of running large language models. However, two weeks of…
AI-processed from Habr AI; edited by Hamidun News
DGX Spark with Qwen3: the NVIDIA test that didn't tell the whole story
In the world of artificial intelligence, where large language models (LLM) are becoming increasingly powerful and demanding, the problem of insufficient memory to run them is particularly acute. Many enthusiasts and even professionals face situations where a model with tens of billions of parameters simply doesn't fit into the limited amount of video memory (VRAM) of modern graphics processors. In such cases, compromises must be made: either use "offload" of part of the model to the central processor (CPU), which catastrophically reduces performance, or resort to expensive cloud solutions, which can also raise concerns about data privacy.
NVIDIA, striving to solve this urgent problem, introduced the DGX Spark system (also known as GB10), equipped with 128 GB of unified memory. The stated price of the device ranges from 400 to 500 thousand rubles. However, as shown by two weeks of in-depth testing, the results of this system's work with the Qwen3 model turned out to be ambiguous, raising questions about the real value of this solution.
The context of the problem that DGX Spark is supposed to solve is well known to anyone working with LLM. A typical situation is an attempt to run a model with 32 billion parameters on a graphics card at the level of RTX 4090 with its 24 GB VRAM. Inevitably, a memory deficit arises.
The alternative in the form of CPU offload, although it allows you to run the model, leads to an unacceptable drop in performance. Cloud services, in turn, not only require significant financial investment, but also raise questions about the security and privacy of processed data, since it is transmitted to third-party servers. It is in this context that NVIDIA's proposal in the form of DGX Spark with its 128 GB of unified memory looks promising.
Unified memory, unlike traditional separate CPU and GPU memory, allows both processors to work with the same volume of data without the need to copy it, which theoretically should accelerate processing.
In-depth testing of DGX Spark with the Qwen3 model revealed a number of nuances. Extensive benchmarks were conducted, including comparison of various quantization formats of the model (a method that reduces model size and memory requirements at the cost of some loss of accuracy), testing with different volumes of input context (the amount of information the model processes simultaneously) and comparison of performance with more familiar GPU solutions. The results turned out to be far from unambiguous.
In some scenarios, DGX Spark indeed demonstrated advantages, especially when it was necessary to operate with large amounts of data that did not fit into VRAM of standard graphics cards. However, in other cases, especially under intensive loads or when working with certain types of models, the system did not show the expected performance gains. Moreover, sometimes it was observed that DGX Spark's performance even fell short of well-optimized solutions based on multiple powerful GPUs, or required resorting to expensive cloud resources, which negates the device's main advantage – local data processing.
It is worth noting that in some tests, when the model did not fully fit into unified memory, the system automatically switched to using the CPU, which led to significant slowdown comparable to regular offload.
The implications of such mixed results require careful analysis. The financial efficiency of DGX Spark, at a price of half a million rubles, raises questions, especially when considering that in a number of scenarios it does not provide tangible superiority over more accessible or traditional solutions. Architectural limitations of the system, which manifest themselves under certain types of loads, make it not a universal solution, but rather a niche product.
NVIDIA's marketing materials likely emphasize maximum performance indicators and scenarios where 128 GB of memory is indeed a decisive factor, while downplaying situations where this architecture may prove inefficient or even inferior. This means that potential buyers need to carefully weigh their tasks and compare them with the real capabilities of DGX Spark, rather than relying solely on advertising promises.
In conclusion, DGX Spark with 128 GB of unified memory represents an interesting, but not ideal step in the development of hardware for working with large language models. It can be beneficial for a narrow range of tasks where memory capacity is critical and where other solutions simply cannot cope. However, for most users, 128 GB of unified memory is not a "silver bullet" solving all problems. We must acknowledge that to achieve maximum performance and cost-effectiveness, optimized solutions based on multiple GPUs or even hybrid approaches are often more preferable. Careful testing and understanding of DGX Spark's architectural features are key to determining whether this device is truly worth its considerable cost for your specific needs.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.