As an emerging startup, Physical Intelligence isn’t interested in building robots. Instead, the team has better things in mind. The idea is to power hardware with a generic ‘brain’ of AI software that continuously learns, allowing existing machines to autonomously perform a growing number of tasks that require precise movements and dexterity, including household chores.
Over the past year, we’ve seen robot dogs dancing. Some were even equipped to shoot flames, and there were increasingly advanced humanoids and machines built for specialized roles on the assembly line. But we’re still waiting for Rosey the robot. Jetson.
But it may arrive soon. San Francisco’s Physical Intelligence (Pi) is developing robotics that can empower existing machines to perform a variety of tasks, in this case pulling laundry out of the dryer, folding clothes, delicately packaging eggs into containers, and grinding coffee. We have released a general AI model for Beans and ‘bus’ table. You can watch this system move the mobile metal helper around the house, vacuuming, packing and unpacking the dishwasher, making the bed, going through the refrigerator and pantry, taking inventory of contents, and planning dinner. It’s not a stretch to imagine that there is. – And how about cooking dinner?
It is this vision that led Pi to unveil its “universal robot base model,” known as π.0 (PiZero).
At Physical Intelligence (π), our mission is to bring general-purpose AI into the physical world.
We are excited to present the first general model π₀ 🧠 🤖, a first step towards this mission.
Paper, blog, uncensored video: https://t.co/XZ4Luk8Dci pic.twitter.com/XHCu1xZJdq
— Physical Intelligence (@physical_int) October 31, 2024
“We believe this is the first step toward our long-term goal of developing artificial physical intelligence. It will allow users to simply ask robots to perform any task they want, much like they ask large language models (LLMs) and chatbot assistants.” The company explains: “Like LLM, our model can learn a wide variety of data and follow a variety of textual instructions. Unlike LLM, it encompasses images, text, and actions, and learns the embodied experience of the robot and learns to directly output low-level “The new architecture allows us to control a variety of robots through motor commands and request them to perform desired tasks or specialize them for demanding application scenarios.”
In the study, pi-zero demonstrates how AI-trained hardware can perform a variety of tasks that require different levels of dexterity and movement. In total, the baseline model performed 20 tasks that required a variety of skills and manipulations.
“Our goal in selecting these tasks is not to solve specific applications, but to begin to provide models with a general understanding of physical interactions, which is the initial foundation for physical intelligence,” the team says.
π₀ is a VLA generalist.
– Performs skilled tasks (folding laundry, cleaning tables, etc.).
– Transformer+Flow matching combines the benefits of VLM pre-training and continuous motion chunks at 50Hz.
– Pre-trained on large π datasets spanning a variety of form factors. pic.twitter.com/zX9hvVdQuH— Physical Intelligence (@physical_int) October 31, 2024
Now, I’m the last person on New Atlas to be excited about robotics. Because most of what we saw were specialized machines. And to be honest, I’ve seen a lot of humanoids moving boxes from point A. to B. In biology, specialists take one niche, such as bees, butterflies, or koalas, and do it very well. That is, until external factors such as habitat loss or disease reveal their limits.
However, while common animals such as raccoons or grizzly bears may not occupy one niche as well as others, they are much better adapted to a wider range of habitats and food sources. This ultimately makes it better suited to dynamic changes in the environment.
Likewise, ordinary robots can do more than expertly lay brick walls. And our ability to learn equips us with ever-evolving skills that allow us to adapt to the diverse challenges of the physical world.
Pi-zero uses internet-scale vision language model (VLM) pre-training and flow matching to synchronize movement with AI training. Pre-training included 10,000 hours of “dexterous manipulation data” from seven different robot configurations and 68 tasks. This is an addition to existing robot manipulation datasets from OXE, DROID, and Bridge.
We compare π₀ and π₀-small (non-VLM versions) with several previous models.
– Octo and OpenVLA for 0-shot VLA
– ACT and expansion policy for single taskIt outperforms ZeroShot on this task, fine-tuning on new tasks, and in the following languages: pic.twitter.com/TUDsFjitDr
— Physical Intelligence (@physical_int) October 31, 2024
“Dextful robot manipulation requires pi-zero to output high-frequency, up to 50 motor commands,” the research team says. “To provide this level of agility, we developed a new method to augment pre-trained VLMs with continuous task output through flow matching, a variant of the diffusion model. Pre-trained VLMs on diverse robot data and internet-scale data. Starting from , we train a vision-language-action flow matching model, which can then be post-trained on high-quality robot data to solve a variety of downstream tasks.
“To our knowledge, this represents the largest pre-training mixture ever used in a robotic manipulation model,” the researchers noted in the study.
Although the company is still in the early stages of research and development, Pi co-founder and CEO Karol Hausman, a scientist who previously worked in robotics at Google, believes that Pi’s underlying model will overcome existing obstacles in generalization. Time and money spent training hardware on physical world data to learn new tasks. The Pi team also includes co-founders Sergey Levine, who pioneered robotics development at Stanford University, and Brian Ichter, a former research scientist at Google.
In 2023, satirist and architect Karl Sharro went viral with his tweet. “Humans doing hard work for minimum wage while robots write poetry and pictures is not the future I wanted.” That same year, Hollywood came to a halt when members of the Writers Guild of America went on strike, realizing that the future was bleak for creatives facing this new technological age.
And while AI is still coming and is already coming for many of our professions (you don’t need to remind reporters of this), Pi’s vision feels more in line with that of mid-20th century futurists. A world where machines have made our lives more convenient. You might call it naive, but I can tolerate robots coming to do my housework.
You can see more videos of the training the team applied to the pi-zero robot in the Pi blog post, but here’s a video showing off some of its impressive and detailed work.
Processed Egg Sorting
A research paper on PiZero’s development and training can be found here.
Source: Physical Intelligence