Ant Group Teaches Robots Life: New VLA Model Outpaces Pi0.5
While Western venture capitalists are showering money on startups like Physical Intelligence, a player has appeared on the horizon that they clearly didn't…
AI-processed from Jiqizhixin (机器之心); edited by Hamidun News
While Western venture capitalists are showering money on startups like Physical Intelligence, a player has appeared on the horizon that they clearly didn't expect. Ant Group, a company we've been accustomed to associating with payments and fintech, has decided that robots need brains too. And not just any brains, but an open VLA model (Vision-Language-Action) that in many ways surpasses the industry's current darling — Pi0.
5. If you thought the battle for physical AI was limited to Silicon Valley, I have news for you. Let's first figure out what VLA is.
It's not just another chatbot that writes poetry. It's an attempt to create a unified neural network that sees the world, understands text commands, and most importantly, knows how to move mechanical "arms" to complete a task. For a long time, robots were taught each manipulation separately, but VLA promises universality.
Imagine you give a robot the task "bring me that red mug," and it doesn't just recognize the object, but builds a movement trajectory in real time, accounting for obstacles. This is the league Ant Group is now playing in. The intrigue here is that Pi0.
5 from Physical Intelligence was considered the gold standard for open foundational models in robotics. It was something like GPT-3 for manipulators. However, Ant Group claims that their new development surpasses Pi0.
5 in command execution accuracy and adaptability to new conditions. This is a serious blow to the pride of American engineers. The Chinese company didn't simply copy the architecture; it optimized the way the model connects visual data to physical actions, which allowed them to achieve smoother and more precise movements.
Why is this important right now? We're on the verge of a humanoid robot boom. Hardware is becoming cheaper and more accessible, but the main problem remains software — universal intelligence that will allow a robot to leave the sterile laboratory and enter a real warehouse or residential apartment.
Ant Group is betting on openness. By releasing the model as open-source, they're effectively inviting thousands of developers worldwide to test and improve their code. This is a classic move: if you can't win through closed patents, create a standard that everyone will use.
It's interesting to observe how Ant Group itself is transforming. After all the regulatory turbulence in China, the company is seeking new footholds, and deep technologies (DeepTech) seem like an ideal refuge. Robotics is not only hype but also an enormous market for logistics automation, of which China has more than anywhere else.
Perhaps their VLA model was originally trained on data from real Alibaba warehouses, which gives it a huge advantage over models trained in simulations. There's another important layer to this story — the geopolitical one. While the US imposes chip sanctions, China is responding with a surge in algorithms.
An open model of such complexity is a powerful tool of influence. If tomorrow every other robot manufacturing startup in Europe or Asia uses a base from Ant Group, the question of whose ecosystem won becomes moot. We're witnessing the struggle for AI leadership shift from text chats to the physical world.
The key point: Ant Group has set a serious precedent by releasing a model that surpasses Western analogs in open access. Will Physical Intelligence or OpenAI be able to respond with something more impressive, or will leadership in "robot brains" be firmly secured by the East?
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.