🎧 Robotics: This Week's Main News
🎧 Thematic Podcast So let's break this down step by step. Today we have, metaphorically speaking, an entire stack of fresh materials for this deep dive. And I
AI-processed from Hamidun News Podcast; edited by Hamidun News
_Audio podcast — two AI hosts discuss the latest AI news. Full transcript below._
Host A (00:00): So let's break this down step by step. Today we have, metaphorically speaking, an entire stack of fresh materials for this deep dive.
Host B (00:11): Yeah, and I'd say these materials are quite paradigm-shifting.
Host A (00:16): Absolutely. We have closed startup reports, dry scientific publications on new benchmarks, and even investment summaries from giants like NVIDIA.
Host B (00:28): Yes, yes, all mixed together, but in a very logical mix.
Host A (00:32): Exactly. And if we reduce all these numbers, graphs, and news to one thought, we're standing on the threshold of a truly fundamental shift.
Host B (00:41): A shift from rigid control to chaos, right?
Host A (00:44): Yes, precisely to surviving in chaos. We've all gotten used to the image of a modern factory—worth hundreds of millions of dollars. Where everything works like a perfect clock mechanism.
Host B (00:56): Where each detail slides down the conveyor, robots make such beautiful, precise sweeps.
Host A (01:03): Those perfect sweeps. But if someone accidentally leaves a wrench on the floor or moves a workbench just a couple of centimeters, that's it. This whole idyll collapses.
Host B (01:17): The robot blindly crashes into obstacles.
Host A (01:19): Yes. A manipulator with rigid geometric coordinates hardwired into it simply freezes, throws an error, or breaks expensive equipment.
Host B (01:31): And this problem of rigid programming was perhaps the main anchor of the entire industry for decades.
Host A (01:37): One step left, one step right—catastrophe.
Host B (01:39): Exactly. Machines excel at repeating the same mathematical operation a million times, but they turn out to be completely helpless against basic, everyday chaos of the real physical world.
Host A (01:53): They simply have no intuition.
Host B (01:55): Right, they lack what we call understanding of physical context. Or rather, they did until recently. Based on the data we have, the rules of the game are being rewritten right now.
Host A (02:08): And that's exactly the main mission of our discussion today. We're exploring how artificial intelligence is right before our eyes acquiring a physical body.
Host B (02:18): Abandoning multi-volume instruction manuals.
Host A (02:20): Learning to survive in an unpredictable environment. If we analyze all the sources, one striking insight emerges. The future of the real machine revolution is being built not on giant computational powers.
Host B (02:32): And not on endless server farms.
Host A (02:35): Right. It's being built on incredibly elegant, compact, local solutions and muscular adaptability.
Host B (02:41): Listen, to truly grasp the scale of these changes, we need to descend to the basic level of mechanics. Before entrusting a robot with a global supply chain, it needs to master fundamental physics.
Host A (02:54): Like just picking up a part and not breaking it?
Host B (02:56): Exactly. Grabbing a crooked, strangely-shaped part and not crushing it. And the documents show a completely unconventional approach here.
Host A (03:05): Oh yes, one study describes a very revealing experiment. An engineer took a small tracked robot with a manipulator and integrated Google's Gemini Nano language model into it.
Host B (03:18): And here are the important numbers.
Host A (03:19): Yes, the most important numbers. This model has only 270 million parameters.
Host B (03:25): Which against the monstrous versions of GPT is just microscopically small? Those require entire data centers and almost nuclear power plants to run.
Host A (03:36): Absolutely. And here the project author describes this as genuine cyberpunk. The robot trains in simulation, it has no internet access at all, it doesn't reach out to any cloud servers.
Host B (03:48): All local.
Host A (03:49): Completely. And here I want to pause. Why disconnect a modern robot from the cloud, where all these infinite powers lie?
Host B (03:58): Well, because in the physical world the cloud is death due to latency. Signal delay decides everything. Imagine a robot trying to hold a slipping fragile object. The signal from the sensors must travel to a server somewhere on the other side of the world, get processed by a massive model, and come back with a command to squeeze harder by 2 millimeters.
Host A (04:21): And that takes half a second?
Host B (04:23): Yes, and in half a second the object is already shattered on the concrete floor.
Host A (04:28): So this is the difference between searching for an answer in a huge library on the other side of the city and just pulling your hand away from a hot stove at the spinal cord level?
Host B (04:36): Excellent analogy. We need exactly local reflexes. And this compact 270-million-parameter model provides the autonomy we need.
Host A (04:46): Plus, probably power consumption?
Host B (04:48): Of course. Constantly maintaining an active connection to the cloud, transmitting video streams—that's murder for a mobile agent's battery.
Host A (04:56): Got it. In this experiment, the compact model locally received data about joint rotation angles, coordinates, images, and learned to move through trial and error right on board.
Host B (05:07): Using simulators, yes.
Host A (05:08): But here we see 100% simulation. The model sits in a virtual box. In our sources there's also a completely opposite approach to the same chaos problem.
Host B (05:19): Oh, you mean Generalist?
Host A (05:21): Yes. And it sounds even more fantastic. The startup Generalist, which according to reports received investment from NVIDIA. These guys took a completely different path.
Host B (05:31): Instead of virtual reality they use real people?
Host A (05:35): Yes. Instead of writing code, they use what's called messy human data. Regular factory workers are fitted with wearable sensors on their wrists.
Host B (05:45): Mmm, visually it looks like advanced fitness trackers.
Host A (05:49): Right. And these bracelets simply record the pure physics of human movements during daily routine.
Host B (05:56): Every elbow angle, every micro-acceleration...
Host A (06:00): ...of the hand? The slightest adaptations when a person picks up that crooked part.
Host B (06:05): And the results of this approach, let's say, shatter old robotic dogmas. The Generalist reports feature a 99% success rate in real unpredictable factory conditions.
Host A (06:31): So the robot literally absorbs someone else's physical experience. When a part is positioned unusually, it doesn't throw a syntax error—it remembers that very pattern it observed from the human operator?
Host B (06:43): Yes, it remembers the adaptation of the hand.
Host A (06:46): Listen, but an amusing thought comes to mind. If the robot learns from raw human movements, won't it accidentally copy our bad habits?
Host B (06:54): What do you mean?
Host A (06:55): Well, a worker gets distracted, scratches their head with the bracelet, then picks up the part. Will the robot also do this micro-pause to scratch its head?
Host B (07:04): Ah, well, that's exactly what data-cleaning algorithms are for. But there's some truth to it—the machine does inherit specifically human kinematics. Indeed. And what's important here is how this raw physical data collection aligns with the first approach—Google's simulation.
Host A (07:23): Yes, because at first glance these are two completely opposite poles. One sits in a sterile matrix, the other absorbs the chaos of real...
Host B (07:31): ...shop floors. But systematically they solve different tasks in one chain. See, simulation is the ideal safe testing ground.
Host A (07:38): Where compact models can fall a million times.
Host B (07:41): Exactly. Fall, crash into walls, break a virtual manipulator. They learn basic logic without risk of damaging physical hardware worth hundreds of thousands of dollars.
Host A (07:52): Makes sense.
Host B (07:53): But no simulation, even the most advanced one, can mathematically calculate all the nuances of the real world. Point-load wear on a gear, a random drop of oil.
Host A (08:04): Or a glint of light from a window that blinds a sensor.
Host B (08:08): Right, and here's where sample data comes into play. Collecting physical metrics gives that very intuitive muscle memory that simply can't be generated in code.
Host A (08:21): So the industry is assembling a hybrid brain that learned logic in virtual reality and reflexes copied from harsh reality.
Host B (08:29): Absolutely right.
Host A (08:31): And factory reality is indeed harsh. And here's where the really interesting part starts in our materials. Let's say we taught a robot to move perfectly—it's agile. But being agile for 5 minutes at a presentation doesn't mean surviving. Factory 40-degree heat is a brutal test for hardware.
Host A (08:50): What happens if an agent works 24/7 without breaks?
Host B (08:53): Oh, this question forced researchers to rethink the very methods of AI evaluation. The documents describe a completely new testing standard—Benchmark MELT-1.
Host A (09:03): For a long time they measured with tests like MMLU?
Host B (09:06): Yes, but that's static. You give a model text on law and it generates an answer.
Host A (09:12): Essentially a test of erudition in a vacuum.
Host B (09:15): But embodied AI requires different metrics. The MELT-1 benchmark measures the cost of successful solutions, reaction time under stress, and survival under so-called equipment drift.
Host A (09:28): We should clarify the conditions of this benchmark because they sound like torture. Temperature 40 degrees Celsius, 30 days of continuous autonomous operation.
Host B (09:37): Like leaving a laptop on a car dashboard in the sun and running a complex game.
Host A (09:42): Exactly. And the numbers from the MELT-1 report are simply staggering. It features the Metabolic AI architecture. Metabolic AI, which doesn't use Transformers at all.
Host B (09:52): And by composite survivability metrics this Metabolic AI outperformed the well-known Llama 7B Int8 model by 1,600 times.
Host A (10:01): Think about that gap! 1,600 times! The text even contains an alarming statement, I quote: "Transformers die after 11 hours under drift."
Host B (10:13): Well, if you break down the mechanics, it becomes clear why this collapse happens. Transformers were historically designed for batch processing.
Host A (10:21): Meaning they receive requests?
Host B (10:23): Scan the weights, output an answer, and roughly speaking, sleep until the next request. But embodied AI has no right to sleep—it must read data streams every millisecond.
Host A (10:37): And what exactly is meant by this drift that kills a model in 11 hours?
Host B (10:42): Hardware drift is the inevitable change in system properties over time. During long operation, motors heat up, provide different resistance. Factory lubricant loses viscosity.
Host A (10:54): Dust settles on lenses?
Host B (10:56): Exactly, signals get distorted. Transformers can't adapt to this continuous stream of changing data. They accumulate mathematical errors. After 11 hours, the errors overflow the context, and the robot freezes.
Host A (11:10): Or starts twitching chaotically. But the Metabolic AI architecture works differently.
Host B (11:16): Yes, the word metabolic is not accidental. It works like a digestive system for data, constantly digesting the stream, filtering noise, and adapting to heat on the fly.
Host A (11:26): Remarkable. And it's important to emphasize one detail: the intellectual property on Metabolic AI is closed by patents, but the MELT-1 benchmark itself is completely open to the community.
Host B (11:38): Now any engineer can subject their robot to this test, and that's a colossal step—we stop evaluating physical robots by how well they write text.
Host A (11:47): We test real survival. Alright, factory chaos conquered. But real chaos begins where there are pedestrians, cyclists, couriers...
Host B (11:57): City streets.
Host A (11:58): Exactly. If we have resilient systems, it's time to release them onto the streets. And here two companies emerge. The first is London-based Wave. Their CEO Alex Kendall is making a bold bet.
Host B (12:11): Yes, his strategy is a complete rejection of hard-coded traffic rules. They don't program each scenario.
Host A (12:19): Like, what to do if a dog in a red collar runs out?
Host B (12:23): Something like that. They implement end-to-end AI that learns to drive cars right on real roads. The car goes out onto London streets, observes dense traffic, and develops an understanding of chaos.
Host A (12:36): And according to the report, they transfer the experience accumulated in London to cars in San Francisco. And adaptation to a new city happens 1,000 times faster than competitors.
Host B (12:47): Who are still trying to map intersections in 3D maps.
Host A (12:50): But listen, as a skeptic, I must raise criticism. Training a model on live London streets, among real pedestrians. That sounds like a scenario for a massive lawsuit.
Host B (13:02): Well, it sounds risky, yes.
Host A (13:04): One thing when an algorithm makes a mistake in a simulator. Quite another when a 2-ton metal machine decides to try a new pattern at a pedestrian crossing.
Host B (13:14): That's a fair concern, but the testing architecture is more complex. They don't release an absolutely clean, unpredictable neural network onto the roads—there's a rigid safety framework in place.
Host A (13:27): So basic physics of braking?
Host B (13:29): Yes. Braking, obstacle detection that blocks critical errors, but the nuances of smooth merging into traffic flow, micro-concessions at intersections...
Host A (13:40): The things that make driving human?
Host B (13:43): Yes, the machine can only master this empirically.
Host A (13:46): Got it—the framework won't let anyone get hit. What about the second company? Einright? That's logistics—autonomous electric trucks. CEO Russell Charlie presents hard arguments.
Host B (13:58): Economic arguments.
Host A (13:59): Right, he claims that autonomy fundamentally breaks the financial model because driver salary is 30-40 percent of all logistics company expenses.
Host B (14:10): That's not insignificant?
Host A (14:10): And yet he adds the obligatory phrase that humans won't disappear but will transition to a new role as an operator in the dispatch center. My skeptic side is alarmed again.
Host B (14:21): What, is it PR?
Host A (14:22): Yes, will people really stay needed, or is this just corporate reassurance to not scare society with unemployment?
Host B (14:29): If you look at a one-to-two-year horizon, it seems like PR. But if you analyze the entire supply chain from the report, the picture is different. Embodied AI handles tactics brilliantly.
Host A (14:43): Keep the truck in its lane, calculate braking distance?
Host B (14:46): But it's absolutely incapable of taking on macro-strategic and financial responsibility. There's a key quote in the source: "Charlie, we need people who understand both logistics and technology simultaneously."
Host A (15:01): That makes sense. The algorithm drives a truck through a blizzard brilliantly, but if there's a sudden strike at the border, the algorithm won't renegotiate with suppliers.
Host B (15:11): Exactly. There's a shift of the human role up the chain. Monotonous steering is given to the machine.
Host A (15:19): It doesn't sleep, doesn't drink coffee.
Host B (15:21): Right, and a human becomes a systems analyst. One operator in an office controls a fleet of dozens of trucks. This is an objective need for people, but with a different set of meta-skills.
Host A (15:33): And this brings us to the global conclusion. The era of hard code is ending. Onto the scene burst local language models on board.
Host B (15:41): Benchmarks like MELT-1 appeared.
Host A (15:44): Robots are leaving sterile zones, copying the physics of our wrists, learning from street chaos, and the value of human intelligence is not updated—it's transformed. Knowledge of hard syntax is devalued.
Host B (15:56): In its place comes systems thinking. And the most striking thing is that this shift is being fixed even in education. LEGO Education is mentioned in the materials.
Host A (16:09): Yes, they've been teaching kids coding for decades, and by 2026 they radically change their approach, abandoning hard coding.
Host B (16:17): They're implementing AI assistants, intuitive control through cards. Kids no longer need to memorize commands; they need to learn logic and task formulation.
Host A (16:29): The machine will figure out the motor control itself. This is a perfect reflection of the adult industry. If you analyze this evolution, I have one rather provocative thought.
Host B (16:39): Which?
Host A (16:41): We discussed the startup Generalist. Robots learn physics by copying movements of ordinary workers. They adopt our motor skills simply because that's how we're historically designed.
Host B (16:51): Well yes, we teach them.
Host A (16:52): But the basic property of an algorithm is optimization. What happens in a few years when these systems process a billion hours of our movements and start seeking more efficient paths?
Host B (17:04): Meaning they'd go beyond human physics?
Host A (17:08): Right. Is it possible that machines develop their own completely alien kinematics, new muscle memory, a thousand times more efficient than ours, unconstrained by our joints and fatigue?
Host B (17:20): That sounds creepy.
Host A (17:22): And possibly, visually it will seem broken, frightening, incomprehensible to us. Looking at the pace of AI adaptation, it seems this frightening efficiency is no longer fantasy—it's simply the inevitable next step of evolution.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.