A New Architecture Asks Robots to Show Their Work
A research team from Tsinghua University, Shanghai Qi Zhi Institute, and Shanghai AI Laboratory has proposed a method to make robots less brittle. Their system, named ThinkAct, integrates a step...
A research team from Tsinghua University, Shanghai Qi Zhi Institute, and Shanghai AI Laboratory has proposed a method to make robots less brittle. Their system, named ThinkAct, integrates a step of written reasoning before a robot executes any physical command. Published in late April, the work suggests this internal 'monologue'—where the model describes a scene and plans a step in plain language—substantially improves a machine's ability to follow instructions and adapt to new situations.
Current robot control models often map visual data directly to movement. While effective in demonstrations, these systems can fail when an object is moved or lighting changes. The model lacks an understanding of the task; it merely repeats a learned pattern. ThinkAct addresses this by borrowing a concept from advanced language models: chain-of-thought. By generating a reasoning trace like, 'The gripper is open above the block. I will lower it to grasp,' the model creates a structured plan before moving.
The results on standardized benchmarks are persuasive. ThinkAct outperformed prior models on sequences of instructions, showing notable gains in completing multi-step tasks without error. Crucially, it generalized better to object arrangements and commands not seen in training. Tests confirmed the reasoning step wasn't just for show; removing it at runtime caused performance to drop, indicating the text actively informs the motor commands.
This approach arrives as companies like Figure AI and Agility Robotics seek robust machines for real-world settings. ThinkAct’s integrated design, producing reasoning and action in one step, avoids the delays of modular systems. However, scaling the method to complex, time-sensitive operations remains unproven.
For business leaders watching AI infrastructure, the implication is clear: the next generation of capable robots may not just react. They may pause, in a computational sense, to articulate a plan. Whether this constitutes true reasoning is a philosophical debate. For practical deployment, the evidence points to a simple conclusion: it makes them work better.
Source: Webpronews
Ready to Modernize Your Business?
Get your AI automation roadmap in minutes, not months.
Analyze Your Workflows →