36Kr (36氪)→ original

Sunrise S3: China's Answer to Video Memory Hunger and Expensive Generation

While the industry is gasping from video memory shortage and H100 price tags, the Chinese company Sunrise decided to come in with the trump cards that really…

AI-processed from 36Kr (36氪); edited by Hamidun News
Sunrise S3: China's Answer to Video Memory Hunger and Expensive Generation
Source: 36Kr (36氪). Collage: Hamidun News.
◐ Listen to article

While the industry is gasping from video memory shortage and H100 price tags, the Chinese company Sunrise decided to come in with the trump cards that really concern developers. Their new S3 chip is not just another attempt to catch up with leaders in teraflops, but a pragmatic tool for solving the "memory bottleneck" problem. Anyone who has ever tried to run a heavy language model locally knows: computational power often sits idle because data doesn't load fast enough from memory.

Sunrise implemented LPDDR6 memory standard support in S3. This is the first case for Chinese GPGPU solutions, and the move looks at least bold. Due to this solution, the volume of available memory increased fourfold compared to the company's previous generation chips.

In a world where model parameters grow faster than budgets for "hardware," such a leap allows keeping much larger contexts and model weights in RAM without resorting to slow external storage. Another engineering trick lies in computational flexibility. S3 allows switching between FP16 and FP4 precision literally on the fly.

For those not following quantization nuances: the transition to FP4 allows compressing model data without critical loss of answer quality. This directly affects generation speed and, more importantly, the economics of the process. When a model takes up less space and requires fewer resources to process each word, operating costs drop exponentially.

The figures Sunrise provides look almost provocative. On popular DeepSeek family models, token generation cost dropped 90% compared to the company's previous solutions. If these metrics are confirmed in real server racks, we will see a new wave of accessible AI services that don't require billion-dollar infrastructure investments.

This is particularly relevant to the Chinese market, where access to advanced Nvidia accelerators is limited by sanctions, and the need for computing power for national LLMs is only growing. It's important to understand the context: Sunrise is not trying to create a universal machine for training models from scratch. S3 is a narrowly specialized "machine" for inference, that is, for running already trained neural networks.

This is the stage where most of the money in the AI business is burned right now. If you can deliver answers to users 10 times cheaper than competitors, your business model suddenly starts to look viable. Ultimately, the success of S3 will depend not only on "hardware" but also on software support.

Chinese manufacturers often stumble precisely on drivers and compatibility with popular libraries like PyTorch. However, the focus on DeepSeek—the most popular open model in the region—gives them an excellent springboard for launch. It seems the era when we measured only GPU power is fading into the past, giving way to the era of memory efficiency.

The main point: Sunrise S3 proves that optimization for specific architectures like DeepSeek and working with LPDDR6 memory can give greater efficiency gains than simply chasing nanometers. Can this approach become the standard for budget inference worldwide?

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…