Feed aggregator



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

Cybathlon Challenges: 2 February 2024, ZURICHEurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

You may not be familiar with Swiss-Mile, but you’d almost certainly recognize its robot: it’s the ANYmal with wheels on its feet that can do all kinds of amazing things. Swiss-Mile has just announced a seed round to commercialize these capabilities across quadrupedal platforms, including Unitree’s, which means it’s even affordable-ish!

It’s always so cool to see impressive robotics research move toward commercialization, and I’ve already started saving up for one of these of my own.

[ Swiss-Mile ]

Thanks Marko!

This video presents the capabilities of PAL Robotics’ TALOS robot as it demonstrates agile and robust walking using Model Predictive Control (MPC) references sent to a Whole-Body Inverse Dynamics (WBID) controller developed in collaboration with Dynamograde. The footage shows TALOS navigating various challenging terrains, including stairs and slopes, while handling unexpected disturbances and additional weight.

[ PAL Robotics ]

Thanks Lorna!

Do you want to create a spectacular bimanual manipulation demo? All it takes is this teleoperation system and a carefully cropped camera shot! This is based on the Mobile ALOHA system from Stanford that we featured in Video Friday last week.

[ AgileX ]

Wing is still trying to make the drone-delivery thing work, and it’s got a new, bigger drone to deliver even more stuff at once.

[ Wing ]

A lot of robotics research claims to be about search and rescue and disaster relief, but it really looks like RSL’s ANYmal can actually pull it off.

And here’s even more impressive video, along with some detail about how the system works.

[ Paper ]

This might be the most appropriate soundtrack for a robot video that I’ve ever heard.

Snakes have long captivated robotics researchers due to their effective locomotion, flexible body structure, and ability to adapt their skin friction to different terrains. While extensive research has delved into serpentine locomotion, there remains a gap in exploring rectilinear locomotion as a robotic solution for navigating through narrow spaces. In this study, we describe the fundamental principles of rectilinear locomotion and apply them to design a soft crawling robot using origami modules constructed from laminated fabrics.

[ SDU ]

We wrote about Fotokite’s innovative tethered drone seven or eight years ago, and it’s good to see the company is still doing solid work.

I do miss the consumer version, though.

[ Fotokite ]

[ JDP ] via [ Petapixel ]

This is SHIVAA the strawberry picking robot of DFKI Robotics Innovation Center. The system is being developed in the RoLand (Robotic Systems in Agriculture) project, coordinated by the #RoboticsInnovationCenter (RIC) of the DFKI Bremen. Within the project we design and develop a semi-autonomous, mobile system that is capable of harvesting strawberries independent of human interaction.

[ DFKI ]

On December 6, 2023, Demarcus Edwards talked to Robotics students as a speaker in the Undergraduate Robotics Pathways & Careers Speaker Series, which aims to answer the question: “What can I do with a robotics degree?”

[ Michigan Robotics ]

This movie, Loss of Sensation, was released in Russia in 1935. It seems to be the movie that really, really irritated Karel Čapek, because they made his “robots” into mechanical beings instead of biological ones.

[ IMDB ]

Experiments on physical continuum robot are the gold standard for evaluations. Currently, as no commercial continuum robot platform is available, a large variety of early-stage prototypes exists. These prototypes are developed by individual research groups and are often used for a single publication. Thus, a significant amount of time is devoted to creating proprietary hardware and software hindering the development of a common platform, and shifting away scarce time and efforts from the main research challenges. We address this problem by proposing an open-source actuation module, which can be used to build different types of continuum robots. It consists of a high-torque brushless electric motor, a high resolution optical encoder, and a low-gear-ratio transmission. For this article, we create three different types of continuum robots. In addition, we illustrate, for the first time, that continuum robots built with our actuation module can proprioceptively detect external forces. Consequently, our approach opens untapped and under-investigated research directions related to the dynamics and advanced control of continuum robots, where sensing the generalized flow and effort is mandatory. Besides that, we democratize continuum robots research by providing open-source software and hardware with our initiative called the Open Continuum Robotics Project, to increase the accessibility and reproducibility of advanced methods.

Agriculture 4.0 presents several challenges for the automation of various operations, including the fundamental task of harvesting. One of the crucial aspects in the automatic harvesting of high value crops is the grip and detachment of delicate fruits without spoiling them or interfering with the environment. Soft robotic systems, particularly soft grippers, offer a promising solution for this problem, as they can operate in unstructured environments, manipulate objects delicately, and interact safely with humans. In this context, this article presents a soft gripper design for harvesting as well as for pick-and-place operations of small and medium-sized fruits. The gripper is fabricated using the 3D printing technology with a flexible thermoplastic elastomer filament. This approach enables the production of an economical, compact, easily replicable, and interchangeable gripper by utilizing soft robotics principles, such as flexible structures and pneumatic actuation.

Soft robots are characterized by their mechanical compliance, making them well-suited for various bio-inspired applications. However, the challenge of preserving their flexibility during deployment has necessitated using soft sensors which can enhance their mobility, energy efficiency, and spatial adaptability. Through emulating the structure, strategies, and working principles of human senses, soft robots can detect stimuli without direct contact with soft touchless sensors and tactile stimuli. This has resulted in noteworthy progress within the field of soft robotics. Nevertheless, soft, touchless sensors offer the advantage of non-invasive sensing and gripping without the drawbacks linked to physical contact. Consequently, the popularity of soft touchless sensors has grown in recent years, as they facilitate intuitive and safe interactions with humans, other robots, and the surrounding environment. This review explores the emerging confluence of touchless sensing and soft robotics, outlining a roadmap for deployable soft robots to achieve human-level dexterity.



You’re familiar with Karel Čapek, right? If not, you should be—he’s the guy who (along with his brother Josef) invented the word “robot.” Čapek introduced robots to the world in 1921, when his play “R.U.R.” (subtitled “Rossum’s Universal Robots”) was first performed in Prague. It was performed in New York City the next year, and by the year after that, it had been translated into 30 languages. Translated, that is, except for the word “robot” itself, which originally described artificial humans but within a decade of its introduction came to mean things that were mechanical and electronic in nature.

Čapek, it turns out, was a little miffed that his “robots” had been so hijacked, and in 1935, he wrote a column in the Lidové noviny “defending” his vision of what robots should be, while also resigning himself to what they had become. A new translation of this column is included as an afterword in a new English translation of R.U.R. that is accompanied by 20 essays exploring robotics, philosophy, politics, and AI in the context of the play, and it makes for fascinating reading.

R.U.R. and the Vision of Artificial Life is edited by Jitka Čejková, a professor at the Chemical Robotics Laboratory at the University of Chemistry and Technology Prague, and whose research interests arguably make her one of the most qualified people to write about Čapek’s perspective on robots. “The chemical robots in the form of microparticles that we designed and investigated, and that had properties similar to living cells, were much closer to Čapek’s original ideas than any other robots today,” Čejková explains in the book’s introduction. These microparticles can exhibit surprisingly complex autonomous behaviors under specific situations, like solving simple mazes:

“I started to call these droplets liquid robots,” says Čejková. “Just as Rossum’s robots were artificial human beings that only looked like humans and could imitate only certain characteristics and behaviors of humans, so liquid robots, as artificial cells, only partially imitate the behavior of their living counterparts.”

What is or is not called a robot is an ongoing debate that most roboticists seem to try to avoid, but personally, I appreciate the idea that very broadly, a robot is something that seems alive but isn’t—something with independent embodied intelligence. Perhaps the requirement that a robot is mechanical and electronic is too strict, although as Čapek himself realized a hundred years ago, what defines a robot has escaped from the control of anyone, even its creator. Here then is his column from 1935, excerpted from R.U.R. and the Vision of Artificial Life, released just today:

“THE AUTHOR OF THE ROBOTS DEFENDS HIMSELF” By Karel ČapekPublished in Lidové noviny, June 9, 1935

I know it is a sign of ingratitude on the part of the author, if he raises both hands against a certain popularity that has befallen something which is called his spiritual brainchild; for that matter, he is aware that by doing so he can no longer change a thing. The author was silent a goodly time and kept his own counsel, while the notion that robots have limbs of metal and innards of wire and cogwheels (or the like) has become current; he has learned, without any great pleasure, that genuine steel robots have started to appear, robots that move in various directions, tell the time, and even fly airplanes; but when he recently read that, in Moscow, they have shot a major film, in which the world is trampled underfoot by mechanical robots, driven by electromagnetic waves, he developed a strong urge to protest, at least in the name of his own robots. For his robots were not mechanisms. They were not made of sheet metal and cogwheels. They were not a celebration of mechanical engineering. If the author was thinking of any of the marvels of the human spirit during their creation, it was not of technology, but of science. With outright horror, he refuses any responsibility for the thought that machines could take the place of people, or that anything like life, love, or rebellion could ever awaken in their cogwheels. He would regard this somber vision as an unforgivable overvaluation of mechanics or as a severe insult to life.

The author of the robots appeals to the fact that he must know the most about it: and therefore he pronounces that his robots were created quite differently—that is, by a chemical path. The author was thinking about modern chemistry, which in various emulsions (or whatever they are called) has located substances and forms that in some ways behave like living matter. He was thinking about biological chemistry, which is constantly discovering new chemical agents that have a direct regulatory influence on living matter; about chemistry, which is finding—and to some extent already building—those various enzymes, hormones, and vitamins that give living matter its ability to grow and multiply and arrange all the other necessities of life. Perhaps, as a scientific layman, he might develop an urge to attribute this patient ingenious scholarly tinkering with the ability to one day produce, by artificial means, a living cell in the test tube; but for many reasons, amongst which also belonged a respect for life, he could not resolve to deal so frivolously with this mystery. That is why he created a new kind of matter by chemical synthesis, one which simply behaves a lot like the living; it is an organic substance, different from that from which living cells are made; it is something like another alternative to life, a material substrate in which life could have evolved if it had not, from the beginning, taken a different path. We do not have to suppose that all the different possibilities of creation have been exhausted on our planet. The author of the robots would regard it as an act of scientific bad taste if he had brought something to life with brass cogwheels or created life in the test tube; the way he imagined it, he created only a new foundation for life, which began to behave like living matter, and which could therefore have become a vehicle of life—but a life which remains an unimaginable and incomprehensible mystery. This life will reach its fulfillment only when (with the aid of considerable inaccuracy and mysticism) the robots acquire souls. From which it is evident that the author did not invent his robots with the technological hubris of a mechanical engineer, but with the metaphysical humility of a spiritualist.

Well then, the author cannot be blamed for what might be called the worldwide humbug over the robots. The author did not intend to furnish the world with plate metal dummies stuffed with cogwheels, photocells, and other mechanical gizmos. It appears, however, that the modern world is not interested in his scientific robots and has replaced them with technological ones; and these are, as is apparent, the true flesh-of-our-flesh of our age. The world needed mechanical robots, for it believes in machines more than it believes in life; it is fascinated more by the marvels of technology than by the miracle of life. For which reason, the author who wanted—through his insurgent robots, striving for a soul—to protest against the mechanical superstition of our times, must in the end claim something which nobody can deny him: the honor that he was defeated.

Excerpted from R.U.R. and the Vision of Artificial Life, by Karel Čapek, edited by Jitka Čejková. Published by The MIT Press. Copyright © 2024 MIT. All rights reserved.



You’re familiar with Karel Čapek, right? If not, you should be—he’s the guy who (along with his brother Josef) invented the word “robot.” Čapek introduced robots to the world in 1921, when his play “R.U.R.” (subtitled “Rossum’s Universal Robots”) was first performed in Prague. It was performed in New York City the next year, and by the year after that, it had been translated into 30 languages. Translated, that is, except for the word “robot” itself, which originally described artificial humans but within a decade of its introduction came to mean things that were mechanical and electronic in nature.

Čapek, it turns out, was a little miffed that his “robots” had been so hijacked, and in 1935, he wrote a column in the Lidové noviny “defending” his vision of what robots should be, while also resigning himself to what they had become. A new translation of this column is included as an afterword in a new English translation of R.U.R. that is accompanied by 20 essays exploring robotics, philosophy, politics, and AI in the context of the play, and it makes for fascinating reading.

R.U.R. and the Vision of Artificial Life is edited by Jitka Čejková, a professor at the Chemical Robotics Laboratory at the University of Chemistry and Technology Prague, and whose research interests arguably make her one of the most qualified people to write about Čapek’s perspective on robots. “The chemical robots in the form of microparticles that we designed and investigated, and that had properties similar to living cells, were much closer to Čapek’s original ideas than any other robots today,” Čejková explains in the book’s introduction. These microparticles can exhibit surprisingly complex autonomous behaviors under specific situations, like solving simple mazes:

“I started to call these droplets liquid robots,” says Čejková. “Just as Rossum’s robots were artificial human beings that only looked like humans and could imitate only certain characteristics and behaviors of humans, so liquid robots, as artificial cells, only partially imitate the behavior of their living counterparts.”

What is or is not called a robot is an ongoing debate that most roboticists seem to try to avoid, but personally, I appreciate the idea that very broadly, a robot is something that seems alive but isn’t—something with independent embodied intelligence. Perhaps the requirement that a robot is mechanical and electronic is too strict, although as Čapek himself realized a hundred years ago, what defines a robot has escaped from the control of anyone, even its creator. Here then is his column from 1935, excerpted from R.U.R. and the Vision of Artificial Life, released just today:

“THE AUTHOR OF THE ROBOTS DEFENDS HIMSELF” By Karel ČapekPublished in Lidové noviny, June 9, 1935

I know it is a sign of ingratitude on the part of the author, if he raises both hands against a certain popularity that has befallen something which is called his spiritual brainchild; for that matter, he is aware that by doing so he can no longer change a thing. The author was silent a goodly time and kept his own counsel, while the notion that robots have limbs of metal and innards of wire and cogwheels (or the like) has become current; he has learned, without any great pleasure, that genuine steel robots have started to appear, robots that move in various directions, tell the time, and even fly airplanes; but when he recently read that, in Moscow, they have shot a major film, in which the world is trampled underfoot by mechanical robots, driven by electromagnetic waves, he developed a strong urge to protest, at least in the name of his own robots. For his robots were not mechanisms. They were not made of sheet metal and cogwheels. They were not a celebration of mechanical engineering. If the author was thinking of any of the marvels of the human spirit during their creation, it was not of technology, but of science. With outright horror, he refuses any responsibility for the thought that machines could take the place of people, or that anything like life, love, or rebellion could ever awaken in their cogwheels. He would regard this somber vision as an unforgivable overvaluation of mechanics or as a severe insult to life.

The author of the robots appeals to the fact that he must know the most about it: and therefore he pronounces that his robots were created quite differently—that is, by a chemical path. The author was thinking about modern chemistry, which in various emulsions (or whatever they are called) has located substances and forms that in some ways behave like living matter. He was thinking about biological chemistry, which is constantly discovering new chemical agents that have a direct regulatory influence on living matter; about chemistry, which is finding—and to some extent already building—those various enzymes, hormones, and vitamins that give living matter its ability to grow and multiply and arrange all the other necessities of life. Perhaps, as a scientific layman, he might develop an urge to attribute this patient ingenious scholarly tinkering with the ability to one day produce, by artificial means, a living cell in the test tube; but for many reasons, amongst which also belonged a respect for life, he could not resolve to deal so frivolously with this mystery. That is why he created a new kind of matter by chemical synthesis, one which simply behaves a lot like the living; it is an organic substance, different from that from which living cells are made; it is something like another alternative to life, a material substrate in which life could have evolved if it had not, from the beginning, taken a different path. We do not have to suppose that all the different possibilities of creation have been exhausted on our planet. The author of the robots would regard it as an act of scientific bad taste if he had brought something to life with brass cogwheels or created life in the test tube; the way he imagined it, he created only a new foundation for life, which began to behave like living matter, and which could therefore have become a vehicle of life—but a life which remains an unimaginable and incomprehensible mystery. This life will reach its fulfillment only when (with the aid of considerable inaccuracy and mysticism) the robots acquire souls. From which it is evident that the author did not invent his robots with the technological hubris of a mechanical engineer, but with the metaphysical humility of a spiritualist.

Well then, the author cannot be blamed for what might be called the worldwide humbug over the robots. The author did not intend to furnish the world with plate metal dummies stuffed with cogwheels, photocells, and other mechanical gizmos. It appears, however, that the modern world is not interested in his scientific robots and has replaced them with technological ones; and these are, as is apparent, the true flesh-of-our-flesh of our age. The world needed mechanical robots, for it believes in machines more than it believes in life; it is fascinated more by the marvels of technology than by the miracle of life. For which reason, the author who wanted—through his insurgent robots, striving for a soul—to protest against the mechanical superstition of our times, must in the end claim something which nobody can deny him: the honor that he was defeated.

Excerpted from R.U.R. and the Vision of Artificial Life, by Karel Čapek, edited by Jitka Čejková. Published by The MIT Press. Copyright © 2024 MIT. All rights reserved.

This paper presents and discusses the development and deployment of a tour guide robot as part of the 5 g-TOURS EU research project, aimed at developing applications enabled by 5G technology in different use cases. The objective is the development of an autonomous robotic application where intelligence is off-loaded to a remote machine via 5G network, so as to lift most of the computational load from the robot itself. The application uses components that have been widely studied in robotics, (i.e., localization, mapping, planning, interaction). However, the characteristics of the network and interactions with visitors in the wild introduce specific problems which must be taken into account. The paper discusses in detail such problems, summarizing the main results achieved both from the methodological and the experimental standpoint, and is completed by the description of the general functional architecture of the whole system, including navigation and operational services. The software implementation is also publicly available.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

Cybathlon Challenges: 2 February 2024, ZURICHEurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

Figure’s robot is watching videos of humans making coffee, and then making coffee on its own.

While this is certainly impressive, just be aware that it’s not at all clear from the video exactly how impressive it is.

[ Figure ]

It’s really the shoes that get me with Westwood’s THEMIS robot.

THEMIS can also deliver a package just as well as a human can, if not better!

And I appreciate the inclusion of all of these outtakes, too:

[ Westwood Robotics ]

Kepler Exploration Robot recently unveiled its latest innovation, the Kepler Forerunner series of general-purpose humanoid robots. This advanced humanoid stands at a height of 178cm (5’10”), weighs 85kg (187 lbs.), and boasts an intelligent and dexterous hand with 12 degrees of freedom. The entire body has up to 40 degrees of freedom, enabling functionalities such as navigating complex terrains, intelligent obstacle avoidance, flexible manipulation of hands, powerful lifting and carrying of heavy loads, hand-eye coordination, and intelligent interactive communication.

[ Kepler Exploration ]

Introducing the new Ballie, your true AI companion. With more advanced intelligence, Ballie can come right to you and project visuals on your walls. It can also help you interact with other connected devices or take care of hassles.

[ Samsung ]

There is a thing called Drone Soccer that got some exposure at CES this week, but apparently it’s been around for several years, and originated in South Korea. Inspired by Quiddich, targeted at STEM students.

[ Drone Soccer ]

Every so often, JPL dumps a bunch of raw footage onto YouTube. This time, there’s Perseverance’s view of Ingenuity taking off, a test of the EELS robot, and an unusual sample tube drop test.

[ JPL ]

Our first months delivering to Walmart customers have made one thing clear: Demand for drone delivery is real. On the heels of our Dallas-wide FAA approvals, today we announced that millions of new DFW-area customers will have access to drone delivery in 2024!

[ Wing ]

Dave Burke works with Biomechatronics researcher Michael Fernandez to test a prosthesis with neural control, by cutting a sheet of paper with scissors. This is the first time in 30 years that Dave has performed this task with his missing hand.

[ MIT ]

Meet DJI’s first delivery drone—FlyCart 30. Overcome traditional transport challenges and start a new era of dynamic aerial delivery with large payload capacity, long operation range, high reliability, and intelligent features.

[ DJI ]

The Waymo Driver autonomously operating both a passenger vehicle and class 8 truck safely in various freeway scenarios, including on-ramps and off-ramps, lane merges, and sharing the road with others.

[ Waymo ]

In this paper, we present DiffuseBot, a physics-augmented diffusion model that generates soft robot morphologies capable of excelling in a wide spectrum of tasks. DiffuseBot bridges the gap between virtually generated content and physical utility by (i) augmenting the diffusion process with a physical dynamical simulation which provides a certificate of performance, and ii) introducing a co-design procedure that jointly optimizes physical design and control by leveraging information about physical sensitivities from differentiable simulation.

[ Paper ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

Cybathlon Challenges: 2 February 2024, ZURICHEurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

Figure’s robot is watching videos of humans making coffee, and then making coffee on its own.

While this is certainly impressive, just be aware that it’s not at all clear from the video exactly how impressive it is.

[ Figure ]

It’s really the shoes that get me with Westwood’s THEMIS robot.

THEMIS can also deliver a package just as well as a human can, if not better!

And I appreciate the inclusion of all of these outtakes, too:

[ Westwood Robotics ]

Kepler Exploration Robot recently unveiled its latest innovation, the Kepler Forerunner series of general-purpose humanoid robots. This advanced humanoid stands at a height of 178cm (5’10”), weighs 85kg (187 lbs.), and boasts an intelligent and dexterous hand with 12 degrees of freedom. The entire body has up to 40 degrees of freedom, enabling functionalities such as navigating complex terrains, intelligent obstacle avoidance, flexible manipulation of hands, powerful lifting and carrying of heavy loads, hand-eye coordination, and intelligent interactive communication.

[ Kepler Exploration ]

Introducing the new Ballie, your true AI companion. With more advanced intelligence, Ballie can come right to you and project visuals on your walls. It can also help you interact with other connected devices or take care of hassles.

[ Samsung ]

There is a thing called Drone Soccer that got some exposure at CES this week, but apparently it’s been around for several years, and originated in South Korea. Inspired by Quiddich, targeted at STEM students.

[ Drone Soccer ]

Every so often, JPL dumps a bunch of raw footage onto YouTube. This time, there’s Perseverance’s view of Ingenuity taking off, a test of the EELS robot, and an unusual sample tube drop test.

[ JPL ]

Our first months delivering to Walmart customers have made one thing clear: Demand for drone delivery is real. On the heels of our Dallas-wide FAA approvals, today we announced that millions of new DFW-area customers will have access to drone delivery in 2024!

[ Wing ]

Dave Burke works with Biomechatronics researcher Michael Fernandez to test a prosthesis with neural control, by cutting a sheet of paper with scissors. This is the first time in 30 years that Dave has performed this task with his missing hand.

[ MIT ]

Meet DJI’s first delivery drone—FlyCart 30. Overcome traditional transport challenges and start a new era of dynamic aerial delivery with large payload capacity, long operation range, high reliability, and intelligent features.

[ DJI ]

The Waymo Driver autonomously operating both a passenger vehicle and class 8 truck safely in various freeway scenarios, including on-ramps and off-ramps, lane merges, and sharing the road with others.

[ Waymo ]

In this paper, we present DiffuseBot, a physics-augmented diffusion model that generates soft robot morphologies capable of excelling in a wide spectrum of tasks. DiffuseBot bridges the gap between virtually generated content and physical utility by (i) augmenting the diffusion process with a physical dynamical simulation which provides a certificate of performance, and ii) introducing a co-design procedure that jointly optimizes physical design and control by leveraging information about physical sensitivities from differentiable simulation.

[ Paper ]

Introduction: The modern worldwide trend toward sedentary behavior comes with significant health risks. An accompanying wave of health technologies has tried to encourage physical activity, but these approaches often yield limited use and retention. Due to their unique ability to serve as both a health-promoting technology and a social peer, we propose robots as a game-changing solution for encouraging physical activity.

Methods: This article analyzes the eight exergames we previously created for the Rethink Baxter Research Robot in terms of four key components that are grounded in the video-game literature: repetition, pattern matching, music, and social design. We use these four game facets to assess gameplay data from 40 adult users who each experienced the games in balanced random order.

Results: In agreement with prior research, our results show that relevant musical cultural references, recognizable social analogues, and gameplay clarity are good strategies for taking an otherwise highly repetitive physical activity and making it engaging and popular among users.

Discussion: Others who study socially assistive robots and rehabilitation robotics can benefit from this work by considering the presented design attributes to generate future hypotheses and by using our eight open-source games to pursue follow-up work on social-physical exercise with robots.

Human-robot cooperation (HRC) is becoming increasingly relevant with the surge in collaborative robots (cobots) for industrial applications. Examples of humans and robots cooperating actively on the same workpiece can be found in research labs around the world, but industrial applications are still mostly limited to robots and humans taking turns. In this paper, we use a cooperative lifting task (co-lift) as a case study to explore how well this task can be learned within a limited time, and how background factors of users may impact learning. The experimental study included 32 healthy adults from 20 to 54 years who performed a co-lift with a collaborative robot. The physical setup is designed as a gamified user training system as research has validated that gamification is an effective methodology for user training. Human motions and gestures were measured using Inertial Measurement Unit (IMU) sensors and used to interact with the robot across three role distributions: human as the leader, robot as the leader, and shared leadership. We find that regardless of age, gender, job category, gaming background, and familiarity with robots, the learning curve of all users showed a satisfactory progression and that all users could achieve successful cooperation with the robot on the co-lift task after seven or fewer trials. The data indicates that some of the background factors of the users such as occupation, past gaming habits, etc., may affect learning outcomes, which will be explored further in future experiments. Overall, the results indicate that the potential of the adoption of HRC in the industry is promising for a diverse set of users after a relatively short training process.

Inflatable fabric beams (IFBs) integrating pleat folds can generate complex motion by modifying the pleat characteristics (e.g., dimensions, orientations). However, the capability of the IFB to return to the folded configuration relies upon the elasticity of the fabrics, requiring additional pressure inputs or complementary mechanisms. Using soft compliant elements (SCEs) assembled onto pleat folds is an appealing approach to improving the IFB elasticity and providing a range of spatial configurations when pressurized. This study introduces an actuator comprising an IFB with pleat folds and SCEs. By methodologically assembling the SCEs onto the pleat folds, we constrain the IFB unfolding to achieve out-of-plane motion at 5 kPa. Besides, the proposed actuator can generate angular displacement by regulating the input pressure (> 5 kPa). A matrix-based representation and model are proposed to analyze the actuator motion. We experimentally study the actuator’s angular displacement by modifying SCE shapes, fold dimensions, and assembly distances of SCEs. Moreover, we analyze the effects of incorporating two SCEs onto a pleat fold. Our results show that the actuator motion can be tuned by integrating SCEs with different stiffness and varying the pleat fold dimensions. In addition, we demonstrate that the integration of two SCEs onto the pleat fold permits the actuator to return to its folded configuration when depressurized. In order to demonstrate the versatility of the proposed actuator, we devise and conduct experiments showcasing the implementation of a planar serial manipulator and a soft gripper with two grasping modalities.

In the development of dialogue systems for android robots, the goal is to achieve human-like communication. However, subtle differences between android robots and humans are noticeable, leading even human-like android robots to be perceived differently. Understanding how humans accept android robots and optimizing their behavior is crucial. Generally, human customers have various expectations and anxieties when interacting with a robotic salesclerk instead of a human. Asymmetric communication arises when android robots treat customers like humans while customers treat robots as machines. Focusing on human-robot interaction in a tourist guide scenario, In this paper, we propose an asymmetric communication strategy that does not use estimation technology for preference information, but instead performs changing the agent’s character in order to pretend to tailor to the customer. In line with this, we prepared an experimental method to evaluate asymmetric communication strategies, using video clips to simulate dialogues. Participants completed questionnaires without prior knowledge of whether the salesclerk was human-like or robotic. The method allowed us to assess how participants treated the salesclerk and the effectiveness of the asymmetric communication strategy. Additionally, during our demonstration in a dialogue robot competition, 29 visitors had a positive impression of the android robot’s asymmetric communication strategy and reported a high level of satisfaction with the dialogue.

Introduction: In the current landscape marked by swift digital transformations and global disruptions, comprehending the intersection of digitalization and sustainable business practices is imperative. This study focuses on the food industries of China and Pakistan, aiming to explore the influence of digitalization on cleaner production.

Methods: Employing a cross-sectional design, data were gathered through online surveys involving a diverse group of employees. Special attention was given to the emergent phenomenon of technostress and its subsequent implications for individuals in the workplace.

Results: The findings of the study demonstrate a significant impact of digitalization on both resource mobilization and interaction quality within the surveyed food industries. Notably, technostress emerged as a mediating factor, shedding light on the psychological challenges associated with digital transitions. The study further reveals the moderating role of the COVID-19 pandemic, altering the dynamics among the variables under investigation.

Discussion: From a theoretical perspective, this research contributes to the cleaner production literature by bridging it with the human-centric nuances of technological adaptation. On a practical level, the study emphasizes the importance of aligning digital strategies with resource mobilization to achieve sustainable outcomes. For the food industry and potentially beyond, the research offers a roadmap for integrating digital tools into operations, ensuring efficiency, and promoting cleaner production.

A basic assumption in most approaches to simultaneous localization and mapping (SLAM) is the static nature of the environment. In recent years, some research has been devoted to the field of SLAM in dynamic environments. However, most of the studies conducted in this field have implemented SLAM by removing and filtering the moving landmarks. Moreover, the use of several robots in large, complex, and dynamic environments can significantly improve performance on the localization and mapping task, which has attracted many researchers to this problem more recently. In multi-robot SLAM, the robots can cooperate in a decentralized manner without the need for a central processing center to obtain their positions and a more precise map of the environment. In this article, a new decentralized approach is presented for multi-robot SLAM problems in dynamic environments with unknown initial correspondence. The proposed method applies a modified Fast-SLAM method, which implements SLAM in a decentralized manner by considering moving landmarks in the environment. Due to the unknown initial correspondence of the robots, a geographical approach is embedded in the proposed algorithm to align and merge their maps. Data association is also embedded in the algorithm; this is performed using the measurement predictions in the SLAM process of each robot. Finally, simulation results are provided to demonstrate the performance of the proposed method.

Smart haptic gloves are a new technology emerging in Virtual Reality (VR) with a promise to enhance sensory feedback in VR. This paper presents one of the first attempts to explore its application to surgical training for neurosurgery trainees using VR-based surgery simulators. We develop and evaluate a surgical simulator for External Ventricular Drain Placement (EVD), a common procedure in the field of neurosurgery. Haptic gloves are used in combination with a VR environment to augment the experience of burr hole placement, and flexible catheter manipulation. The simulator was integrated into the training curriculum at the 2022 Canadian Neurosurgery Rookie Bootcamp. Thirty neurosurgery residents used the simulator where objective performance metrics and subjective experience scores were acquired. We provide the details of the simulator development, as well as the user study results and draw conclusions on the benefits added by the haptic gloves and future directions.

Introduction: Image-based heart rate estimation technology offers a contactless approach to healthcare monitoring that could improve the lives of millions of people. In order to comprehensively test or optimize image-based heart rate extraction methods, the dataset should contain a large number of factors such as body motion, lighting conditions, and physiological states. However, collecting high-quality datasets with complete parameters is a huge challenge.

Methods: In this paper, we introduce a bionic human model based on a three-dimensional (3D) representation of the human body. By integrating synthetic cardiac signal and body involuntary motion into the 3D model, five well-known traditional and four deep learning iPPG (imaging photoplethysmography) extraction methods are used to test the rendered videos.

Results: To compare with different situations in the real world, four common scenarios (stillness, expression/talking, light source changes, and physical activity) are created on each 3D human. The 3D human can be built with any appearance and different skin tones. A high degree of agreement is achieved between the signals extracted from videos with the synthetic human and videos with a real human-the performance advantages and disadvantages of the selected iPPG methods are consistent for both real and 3D humans.

Discussion: This technology has the capability to generate synthetic humans within various scenarios, utilizing precisely controlled parameters and disturbances. Furthermore, it holds considerable potential for testing and optimizing image-based vital signs methods in challenging situations where real people with reliable ground truth measurements are difficult to obtain, such as in drone rescue.



The generative AI revolution embodied in tools like ChatGPT, Midjourney, and many others is at its core based on a simple formula: Take a very large neural network, train it on a huge dataset scraped from the Web, and then use it to fulfill a broad range of user requests. Large language models (LLMs) can answer questions, write code, and spout poetry, while image-generating systems can create convincing cave paintings or contemporary art.

So why haven’t these amazing AI capabilities translated into the kinds of helpful and broadly useful robots we’ve seen in science fiction? Where are the robots that can clean off the table, fold your laundry, and make you breakfast?

Unfortunately, the highly successful generative AI formula—big models trained on lots of Internet-sourced data—doesn’t easily carry over into robotics, because the Internet is not full of robotic-interaction data in the same way that it’s full of text and images. Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks. Despite tremendous progress on robot-learning algorithms, without abundant data we still can’t enable robots to perform real-world tasks (like making breakfast) outside the lab. The most impressive results typically only work in a single laboratory, on a single robot, and often involve only a handful of behaviors.

If the abilities of each robot are limited by the time and effort it takes to manually teach it to perform a new task, what if we were to pool together the experiences of many robots, so a new robot could learn from all of them at once? We decided to give it a try. In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality.

Here is what we learned from the first phase of this effort.

How to create a generalist robot

Humans are far better at this kind of learning. Our brains can, with a little practice, handle what are essentially changes to our body plan, which happens when we pick up a tool, ride a bicycle, or get in a car. That is, our “embodiment” changes, but our brains adapt. RT-X is aiming for something similar in robots: to enable a single deep neural network to control many different types of robots, a capability called cross-embodiment. The question is whether a deep neural network trained on data from a sufficiently large number of different robots can learn to “drive” all of them—even robots with very different appearances, physical properties, and capabilities. If so, this approach could potentially unlock the power of large datasets for robotic learning.

The scale of this project is very large because it has to be. The RT-X dataset currently contains nearly a million robotic trials for 22 types of robots, including many of the most commonly used robotic arms on the market. The robots in this dataset perform a huge range of behaviors, including picking and placing objects, assembly, and specialized tasks like cable routing. In total, there are about 500 different skills and interactions with thousands of different objects. It’s the largest open-source dataset of real robotic actions in existence.

Surprisingly, we found that our multirobot data could be used with relatively simple machine-learning methods, provided that we follow the recipe of using large neural-network models with large datasets. Leveraging the same kinds of models used in current LLMs like ChatGPT, we were able to train robot-control algorithms that do not require any special features for cross-embodiment. Much like a person can drive a car or ride a bicycle using the same brain, a model trained on the RT-X dataset can simply recognize what kind of robot it’s controlling from what it sees in the robot’s own camera observations. If the robot’s camera sees a UR10 industrial arm, the model sends commands appropriate to a UR10. If the model instead sees a low-cost WidowX hobbyist arm, the model moves it accordingly.

To test the capabilities of our model, five of the laboratories involved in the RT-X collaboration each tested it in a head-to-head comparison against the best control system they had developed independently for their own robot. Each lab’s test involved the tasks it was using for its own research, which included things like picking up and moving objects, opening doors, and routing cables through clips. Remarkably, the single unified model provided improved performance over each laboratory’s own best method, succeeding at the tasks about 50 percent more often on average.

While this result might seem surprising, we found that the RT-X controller could leverage the diverse experiences of other robots to improve robustness in different settings. Even within the same laboratory, every time a robot attempts a task, it finds itself in a slightly different situation, and so drawing on the experiences of other robots in other situations helped the RT-X controller with natural variability and edge cases. Here are a few examples of the range of these tasks:




Building robots that can reason

Encouraged by our success with combining data from many robot types, we next sought to investigate how such data can be incorporated into a system with more in-depth reasoning capabilities. Complex semantic reasoning is hard to learn from robot data alone. While the robot data can provide a range of physical capabilities, more complex tasks like “Move apple between can and orange” also require understanding the semantic relationships between objects in an image, basic common sense, and other symbolic knowledge that is not directly related to the robot’s physical capabilities.

So we decided to add another massive source of data to the mix: Internet-scale image and text data. We used an existing large vision-language model that is already proficient at many tasks that require some understanding of the connection between natural language and images. The model is similar to the ones available to the public such as ChatGPT or Bard. These models are trained to output text in response to prompts containing images, allowing them to solve problems such as visual question-answering, captioning, and other open-ended visual understanding tasks. We discovered that such models can be adapted to robotic control simply by training them to also output robot actions in response to prompts framed as robotic commands (such as “Put the banana on the plate”). We applied this approach to the robotics data from the RT-X collaboration.

The RT-X model uses images or text descriptions of specific robot arms doing different tasks to output a series of discrete actions that will allow any robot arm to do those tasks. By collecting data from many robots doing many tasks from robotics labs around the world, we are building an open-source dataset that can be used to teach robots to be generally useful.Chris Philpot

To evaluate the combination of Internet-acquired smarts and multirobot data, we tested our RT-X model with Google’s mobile manipulator robot. We gave it our hardest generalization benchmark tests. The robot had to recognize objects and successfully manipulate them, and it also had to respond to complex text commands by making logical inferences that required integrating information from both text and images. The latter is one of the things that make humans such good generalists. Could we give our robots at least a hint of such capabilities?

Even without specific training, this Google research robot is able to follow the instruction “move apple between can and orange.” This capability is enabled by RT-X, a large robotic manipulation dataset and the first step towards a general robotic brain.

We conducted two sets of evaluations. As a baseline, we used a model that excluded all of the generalized multirobot RT-X data that didn’t involve Google’s robot. Google’s robot-specific dataset is in fact the largest part of the RT-X dataset, with over 100,000 demonstrations, so the question of whether all the other multirobot data would actually help in this case was very much open. Then we tried again with all that multirobot data included.

In one of the most difficult evaluation scenarios, the Google robot needed to accomplish a task that involved reasoning about spatial relations (“Move apple between can and orange”); in another task it had to solve rudimentary math problems (“Place an object on top of a paper with the solution to ‘2+3’”). These challenges were meant to test the crucial capabilities of reasoning and drawing conclusions.

In this case, the reasoning capabilities (such as the meaning of “between” and “on top of”) came from the Web-scale data included in the training of the vision-language model, while the ability to ground the reasoning outputs in robotic behaviors—commands that actually moved the robot arm in the right direction—came from training on cross-embodiment robot data from RT-X. Some examples of evaluations where we asked the robots to perform tasks not included in their training data are shown below.While these tasks are rudimentary for humans, they present a major challenge for general-purpose robots. Without robotic demonstration data that clearly illustrates concepts like “between,” “near,” and “on top of,” even a system trained on data from many different robots would not be able to figure out what these commands mean. By integrating Web-scale knowledge from the vision-language model, our complete system was able to solve such tasks, deriving the semantic concepts (in this case, spatial relations) from Internet-scale training, and the physical behaviors (picking up and moving objects) from multirobot RT-X data. To our surprise, we found that the inclusion of the multirobot data improved the Google robot’s ability to generalize to such tasks by a factor of three. This result suggests that not only was the multirobot RT-X data useful for acquiring a variety of physical skills, it could also help to better connect such skills to the semantic and symbolic knowledge in vision-language models. These connections give the robot a degree of common sense, which could one day enable robots to understand the meaning of complex and nuanced user commands like “Bring me my breakfast” while carrying out the actions to make it happen.

The next steps for RT-X

The RT-X project shows what is possible when the robot-learning community acts together. Because of this cross-institutional effort, we were able to put together a diverse robotic dataset and carry out comprehensive multirobot evaluations that wouldn’t be possible at any single institution. Since the robotics community can’t rely on scraping the Internet for training data, we need to create that data ourselves. We hope that more researchers will contribute their data to the RT-X database and join this collaborative effort. We also hope to provide tools, models, and infrastructure to support cross-embodiment research. We plan to go beyond sharing data across labs, and we hope that RT-X will grow into a collaborative effort to develop data standards, reusable models, and new techniques and algorithms.

Our early results hint at how large cross-embodiment robotics models could transform the field. Much as large language models have mastered a wide range of language-based tasks, in the future we might use the same foundation model as the basis for many real-world robotic tasks. Perhaps new robotic skills could be enabled by fine-tuning or even prompting a pretrained foundation model. In a similar way to how you can prompt ChatGPT to tell a story without first training it on that particular story, you could ask a robot to write “Happy Birthday” on a cake without having to tell it how to use a piping bag or what handwritten text looks like. Of course, much more research is needed for these models to take on that kind of general capability, as our experiments have focused on single arms with two-finger grippers doing simple manipulation tasks.

As more labs engage in cross-embodiment research, we hope to further push the frontier on what is possible with a single neural network that can control many robots. These advances might include adding diverse simulated data from generated environments, handling robots with different numbers of arms or fingers, using different sensor suites (such as depth cameras and tactile sensing), and even combining manipulation and locomotion behaviors. RT-X has opened the door for such work, but the most exciting technical developments are still ahead.

This is just the beginning. We hope that with this first step, we can together create the future of robotics: where general robotic brains can power any robot, benefiting from data shared by all robots around the world.



The generative AI revolution embodied in tools like ChatGPT, Midjourney, and many others is at its core based on a simple formula: Take a very large neural network, train it on a huge dataset scraped from the Web, and then use it to fulfill a broad range of user requests. Large language models (LLMs) can answer questions, write code, and spout poetry, while image-generating systems can create convincing cave paintings or contemporary art.

So why haven’t these amazing AI capabilities translated into the kinds of helpful and broadly useful robots we’ve seen in science fiction? Where are the robots that can clean off the table, fold your laundry, and make you breakfast?

Unfortunately, the highly successful generative AI formula—big models trained on lots of Internet-sourced data—doesn’t easily carry over into robotics, because the Internet is not full of robotic-interaction data in the same way that it’s full of text and images. Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks. Despite tremendous progress on robot-learning algorithms, without abundant data we still can’t enable robots to perform real-world tasks (like making breakfast) outside the lab. The most impressive results typically only work in a single laboratory, on a single robot, and often involve only a handful of behaviors.

If the abilities of each robot are limited by the time and effort it takes to manually teach it to perform a new task, what if we were to pool together the experiences of many robots, so a new robot could learn from all of them at once? We decided to give it a try. In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality.

Here is what we learned from the first phase of this effort.

How to create a generalist robot

Humans are far better at this kind of learning. Our brains can, with a little practice, handle what are essentially changes to our body plan, which happens when we pick up a tool, ride a bicycle, or get in a car. That is, our “embodiment” changes, but our brains adapt. RT-X is aiming for something similar in robots: to enable a single deep neural network to control many different types of robots, a capability called cross-embodiment. The question is whether a deep neural network trained on data from a sufficiently large number of different robots can learn to “drive” all of them—even robots with very different appearances, physical properties, and capabilities. If so, this approach could potentially unlock the power of large datasets for robotic learning.

The scale of this project is very large because it has to be. The RT-X dataset currently contains nearly a million robotic trials for 22 types of robots, including many of the most commonly used robotic arms on the market. The robots in this dataset perform a huge range of behaviors, including picking and placing objects, assembly, and specialized tasks like cable routing. In total, there are about 500 different skills and interactions with thousands of different objects. It’s the largest open-source dataset of real robotic actions in existence.

Surprisingly, we found that our multirobot data could be used with relatively simple machine-learning methods, provided that we follow the recipe of using large neural-network models with large datasets. Leveraging the same kinds of models used in current LLMs like ChatGPT, we were able to train robot-control algorithms that do not require any special features for cross-embodiment. Much like a person can drive a car or ride a bicycle using the same brain, a model trained on the RT-X dataset can simply recognize what kind of robot it’s controlling from what it sees in the robot’s own camera observations. If the robot’s camera sees a UR10 industrial arm, the model sends commands appropriate to a UR10. If the model instead sees a low-cost WidowX hobbyist arm, the model moves it accordingly.

To test the capabilities of our model, five of the laboratories involved in the RT-X collaboration each tested it in a head-to-head comparison against the best control system they had developed independently for their own robot. Each lab’s test involved the tasks it was using for its own research, which included things like picking up and moving objects, opening doors, and routing cables through clips. Remarkably, the single unified model provided improved performance over each laboratory’s own best method, succeeding at the tasks about 50 percent more often on average.

While this result might seem surprising, we found that the RT-X controller could leverage the diverse experiences of other robots to improve robustness in different settings. Even within the same laboratory, every time a robot attempts a task, it finds itself in a slightly different situation, and so drawing on the experiences of other robots in other situations helped the RT-X controller with natural variability and edge cases. Here are a few examples of the range of these tasks:




Building robots that can reason

Encouraged by our success with combining data from many robot types, we next sought to investigate how such data can be incorporated into a system with more in-depth reasoning capabilities. Complex semantic reasoning is hard to learn from robot data alone. While the robot data can provide a range of physical capabilities, more complex tasks like “Move apple between can and orange” also require understanding the semantic relationships between objects in an image, basic common sense, and other symbolic knowledge that is not directly related to the robot’s physical capabilities.

So we decided to add another massive source of data to the mix: Internet-scale image and text data. We used an existing large vision-language model that is already proficient at many tasks that require some understanding of the connection between natural language and images. The model is similar to the ones available to the public such as ChatGPT or Bard. These models are trained to output text in response to prompts containing images, allowing them to solve problems such as visual question-answering, captioning, and other open-ended visual understanding tasks. We discovered that such models can be adapted to robotic control simply by training them to also output robot actions in response to prompts framed as robotic commands (such as “Put the banana on the plate”). We applied this approach to the robotics data from the RT-X collaboration.

The RT-X model uses images or text descriptions of specific robot arms doing different tasks to output a series of discrete actions that will allow any robot arm to do those tasks. By collecting data from many robots doing many tasks from robotics labs around the world, we are building an open-source dataset that can be used to teach robots to be generally useful.Chris Philpot

To evaluate the combination of Internet-acquired smarts and multirobot data, we tested our RT-X model with Google’s mobile manipulator robot. We gave it our hardest generalization benchmark tests. The robot had to recognize objects and successfully manipulate them, and it also had to respond to complex text commands by making logical inferences that required integrating information from both text and images. The latter is one of the things that make humans such good generalists. Could we give our robots at least a hint of such capabilities?

Even without specific training, this Google research robot is able to follow the instruction “move apple between can and orange.” This capability is enabled by RT-X, a large robotic manipulation dataset and the first step towards a general robotic brain.

We conducted two sets of evaluations. As a baseline, we used a model that excluded all of the generalized multirobot RT-X data that didn’t involve Google’s robot. Google’s robot-specific dataset is in fact the largest part of the RT-X dataset, with over 100,000 demonstrations, so the question of whether all the other multirobot data would actually help in this case was very much open. Then we tried again with all that multirobot data included.

In one of the most difficult evaluation scenarios, the Google robot needed to accomplish a task that involved reasoning about spatial relations (“Move apple between can and orange”); in another task it had to solve rudimentary math problems (“Place an object on top of a paper with the solution to ‘2+3’”). These challenges were meant to test the crucial capabilities of reasoning and drawing conclusions.

In this case, the reasoning capabilities (such as the meaning of “between” and “on top of”) came from the Web-scale data included in the training of the vision-language model, while the ability to ground the reasoning outputs in robotic behaviors—commands that actually moved the robot arm in the right direction—came from training on cross-embodiment robot data from RT-X. Some examples of evaluations where we asked the robots to perform tasks not included in their training data are shown below.While these tasks are rudimentary for humans, they present a major challenge for general-purpose robots. Without robotic demonstration data that clearly illustrates concepts like “between,” “near,” and “on top of,” even a system trained on data from many different robots would not be able to figure out what these commands mean. By integrating Web-scale knowledge from the vision-language model, our complete system was able to solve such tasks, deriving the semantic concepts (in this case, spatial relations) from Internet-scale training, and the physical behaviors (picking up and moving objects) from multirobot RT-X data. To our surprise, we found that the inclusion of the multirobot data improved the Google robot’s ability to generalize to such tasks by a factor of three. This result suggests that not only was the multirobot RT-X data useful for acquiring a variety of physical skills, it could also help to better connect such skills to the semantic and symbolic knowledge in vision-language models. These connections give the robot a degree of common sense, which could one day enable robots to understand the meaning of complex and nuanced user commands like “Bring me my breakfast” while carrying out the actions to make it happen.

The next steps for RT-X

The RT-X project shows what is possible when the robot-learning community acts together. Because of this cross-institutional effort, we were able to put together a diverse robotic dataset and carry out comprehensive multirobot evaluations that wouldn’t be possible at any single institution. Since the robotics community can’t rely on scraping the Internet for training data, we need to create that data ourselves. We hope that more researchers will contribute their data to the RT-X database and join this collaborative effort. We also hope to provide tools, models, and infrastructure to support cross-embodiment research. We plan to go beyond sharing data across labs, and we hope that RT-X will grow into a collaborative effort to develop data standards, reusable models, and new techniques and algorithms.

Our early results hint at how large cross-embodiment robotics models could transform the field. Much as large language models have mastered a wide range of language-based tasks, in the future we might use the same foundation model as the basis for many real-world robotic tasks. Perhaps new robotic skills could be enabled by fine-tuning or even prompting a pretrained foundation model. In a similar way to how you can prompt ChatGPT to tell a story without first training it on that particular story, you could ask a robot to write “Happy Birthday” on a cake without having to tell it how to use a piping bag or what handwritten text looks like. Of course, much more research is needed for these models to take on that kind of general capability, as our experiments have focused on single arms with two-finger grippers doing simple manipulation tasks.

As more labs engage in cross-embodiment research, we hope to further push the frontier on what is possible with a single neural network that can control many robots. These advances might include adding diverse simulated data from generated environments, handling robots with different numbers of arms or fingers, using different sensor suites (such as depth cameras and tactile sensing), and even combining manipulation and locomotion behaviors. RT-X has opened the door for such work, but the most exciting technical developments are still ahead.

This is just the beginning. We hope that with this first step, we can together create the future of robotics: where general robotic brains can power any robot, benefiting from data shared by all robots around the world.

Musculoskeletal models provide an approach towards simulating the ability of the human body in a variety of human-robot applications. A promising use for musculoskeletal models is to model the physical capabilities of the human body, for example, estimating the strength at the hand. Several methods of modelling and representing human strength with musculoskeletal models have been used in ergonomic analysis, human-robot interaction and robotic assistance. However, it is currently unclear which methods best suit modelling and representing limb strength. This paper compares existing methods for calculating and representing the strength of the upper limb using musculoskeletal models. It then details the differences and relative advantages of the existing methods, enabling the discussion on the appropriateness of each method for particular applications.

Pages