Feed aggregator



Nvidia’s ongoing GTC developer conference in San Jose is, unsurprisingly, almost entirely about AI this year. But in between the AI developments, Nvidia has also made a couple of significant robotics announcements.

First, there’s Project GR00T (with each letter and number pronounced individually so as not to invoke the wrath of Disney), a foundation model for humanoid robots. And secondly, Nvidia has committed to be the founding platinum member of the Open Source Robotics Alliance, a new initiative from the Open Source Robotics Foundation intended to make sure that the Robot Operating System (ROS), a collection of open-source software libraries and tools, has the support that it needs to flourish.

GR00T

First, let’s talk about GR00T (short for “Generalist Robot 00 Technology”). The way that Nvidia presenters enunciated it letter-by-letter during their talks strongly suggests that in private they just say “Groot.” So the rest of us can also just say “Groot” as far as I’m concerned.

As a “general-purpose foundation model for humanoid robots,” GR00T is intended to provide a starting point for specific humanoid robots to do specific tasks. As you might expect from something being presented for the first time at an Nvidia keynote, it’s awfully vague at the moment, and we’ll have to get into it more later on. Here’s pretty much everything useful that Nvidia has told us so far:

“Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” said Jensen Huang, founder and CEO of NVIDIA. “The enabling technologies are coming together for leading roboticists around the world to take giant leaps towards artificial general robotics.”

Robots powered by GR00T... will be designed to understand natural language and emulate movements by observing human actions—quickly learning coordination, dexterity and other skills in order to navigate, adapt and interact with the real world.

This sounds good, but that “will be” is doing a lot of heavy lifting. Like, there’s a very significant “how” missing here. More specifically, we’ll need a better understanding of what’s underlying this foundation model—is there real robot data under there somewhere, or is it based on a massive amount of simulation? Are the humanoid robotic companies involved contributing data to improve GR00T, or instead training their own models based on it? It’s certainly notable that Nvidia is name-dropping most of the heavy-hitters in commercial humanoids, including 1X Technologies, Agility Robotics, Apptronik, Boston Dynamics, Figure AI, Fourier Intelligence, Sanctuary AI, Unitree Robotics, and XPENG Robotics. We’ll be able to check in with some of those folks directly this week to hopefully learn more.

On the hardware side, Nvidia is also announcing a new computing platform called Jetson Thor:

Jetson Thor was created as a new computing platform capable of performing complex tasks and interacting safely and naturally with people and machines. It has a modular architecture optimized for performance, power and size. The SoC includes a next-generation GPU based on NVIDIA Blackwell architecture with a transformer engine delivering 800 teraflops of 8-bit floating point AI performance to run multimodal generative AI models like GR00T. With an integrated functional safety processor, a high-performance CPU cluster and 100GB of ethernet bandwidth, it significantly simplifies design and integration efforts.

Speaking of Nvidia’s Blackwell architecture—today the company also unveiled its B200 Blackwell GPU. And to round out the announcements, the chip foundry TSMC and Synopsys, an electronic design automation company, each said they will be moving Nvidia’s inverse lithography tool, cuLitho, into production.

The Open Source Robotics Alliance

The other big announcement is actually from the Open Source Robotics Foundation, which is launching the Open Source Robotics Alliance (OSRA), a “new initiative to strengthen the governance of our open-source robotics software projects and ensure the health of the Robot Operating System (ROS) Suite community for many years to come.” Nvidia is an inaugural platinum member of the OSRA, but they’re not alone—other platinum members include Intrinsic and Qualcomm. Other significant members include Apex, Clearpath Robotics, Ekumen, eProsima, PickNik, Silicon Valley Robotics, and Zettascale.

“The [Open Source Robotics Foundation] had planned to restructure its operations by broadening community participation and expanding its impact in the larger ROS ecosystem,” explains Vanessa Yamzon Orsi, CEO of the Open Source Robotics Foundation. “The sale of [Open Source Robotics Corporation] was the first step towards that vision, and the launch of the OSRA is the next big step towards that change.”

We had time for a brief Q&A with Orsi to better understand how this will affect the ROS community going forward.

You structured the OSRA to have a mixed membership and meritocratic model like the Linux Foundation—what does that mean, exactly?

Vanessa Yamzon Orsi: We have modeled the OSRA to allow for paths to participation in its activities through both paid memberships (for organizations and their representatives) and the community members who support the projects through their contributions. The mixed model enables participation in the way most appropriate for each organization or individual: contributing funding as a paying member, contributing directly to project development, or both.

What are some benefits for the ROS ecosystem that we can look forward to through OSRA?

Orsi: We expect the OSRA to benefit the OSRF’s projects in three significant ways.

  • By providing a stable stream of funding to cover the maintenance and development of the ROS ecosystem.
  • By encouraging greater community involvement in development through open processes and open, meritocratic status achievement.
  • By bringing greater community involvement in governance and ensuring that all stakeholders have a voice in decision-making.

Why will this be a good thing for ROS users?

Orsi: The OSRA will ensure that ROS and the suite of open source projects under the stewardship of Open Robotics will continue to be supported and strengthened for years to come. By providing organized governance and oversight, clearer paths to community participation, and financial support, it will provide stability and structure to the projects while enabling continued development.


Nvidia’s ongoing GTC developer conference in San Jose is, unsurprisingly, almost entirely about AI this year. But in between the AI developments, Nvidia has also made a couple of significant robotics announcements.

First, there’s Project GR00T (with each letter and number pronounced individually so as not to invoke the wrath of Disney), a foundation model for humanoid robots. And secondly, Nvidia has committed to be the founding platinum member of the Open Source Robotics Alliance, a new initiative from the Open Source Robotics Foundation intended to make sure that the Robot Operating System (ROS), a collection of open-source software libraries and tools, has the support that it needs to flourish.

GR00T

First, let’s talk about GR00T (short for “Generalist Robot 00 Technology”). The way that Nvidia presenters enunciated it letter-by-letter during their talks strongly suggests that in private they just say “Groot.” So the rest of us can also just say “Groot” as far as I’m concerned.

As a “general-purpose foundation model for humanoid robots,” GR00T is intended to provide a starting point for specific humanoid robots to do specific tasks. As you might expect from something being presented for the first time at an Nvidia keynote, it’s awfully vague at the moment, and we’ll have to get into it more later on. Here’s pretty much everything useful that Nvidia has told us so far:

“Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” said Jensen Huang, founder and CEO of NVIDIA. “The enabling technologies are coming together for leading roboticists around the world to take giant leaps towards artificial general robotics.”

Robots powered by GR00T... will be designed to understand natural language and emulate movements by observing human actions—quickly learning coordination, dexterity and other skills in order to navigate, adapt and interact with the real world.

This sounds good, but that “will be” is doing a lot of heavy lifting. Like, there’s a very significant “how” missing here. More specifically, we’ll need a better understanding of what’s underlying this foundation model—is there real robot data under there somewhere, or is it based on a massive amount of simulation? Are the humanoid robotic companies involved contributing data to improve GR00T, or instead training their own models based on it? It’s certainly notable that Nvidia is name-dropping most of the heavy-hitters in commercial humanoids, including 1X Technologies, Agility Robotics, Apptronik, Boston Dynamics, Figure AI, Fourier Intelligence, Sanctuary AI, Unitree Robotics, and XPENG Robotics. We’ll be able to check in with some of those folks directly this week to hopefully learn more.

On the hardware side, Nvidia is also announcing a new computing platform called Jetson Thor:

Jetson Thor was created as a new computing platform capable of performing complex tasks and interacting safely and naturally with people and machines. It has a modular architecture optimized for performance, power and size. The SoC includes a next-generation GPU based on NVIDIA Blackwell architecture with a transformer engine delivering 800 teraflops of 8-bit floating point AI performance to run multimodal generative AI models like GR00T. With an integrated functional safety processor, a high-performance CPU cluster and 100GB of ethernet bandwidth, it significantly simplifies design and integration efforts.

Speaking of Nvidia’s Blackwell architecture—today the company also unveiled its B200 Blackwell GPU. And to round out the announcements, the chip foundry TSMC and Synopsys, an electronic design automation company, each said they will be moving Nvidia’s inverse lithography tool, cuLitho, into production.

The Open Source Robotics Alliance

The other big announcement is actually from the Open Source Robotics Foundation, which is launching the Open Source Robotics Alliance (OSRA), a “new initiative to strengthen the governance of our open-source robotics software projects and ensure the health of the Robot Operating System (ROS) Suite community for many years to come.” Nvidia is an inaugural platinum member of the OSRA, but they’re not alone—other platinum members include Intrinsic and Qualcomm. Other significant members include Apex, Clearpath Robotics, Ekumen, eProsima, PickNik, Silicon Valley Robotics, and Zettascale.

“The [Open Source Robotics Foundation] had planned to restructure its operations by broadening community participation and expanding its impact in the larger ROS ecosystem,” explains Vanessa Yamzon Orsi, CEO of the Open Source Robotics Foundation. “The sale of [Open Source Robotics Corporation] was the first step towards that vision, and the launch of the OSRA is the next big step towards that change.”

We had time for a brief Q&A with Orsi to better understand how this will affect the ROS community going forward.

You structured the OSRA to have a mixed membership and meritocratic model like the Linux Foundation—what does that mean, exactly?

Vanessa Yamzon Orsi: We have modeled the OSRA to allow for paths to participation in its activities through both paid memberships (for organizations and their representatives) and the community members who support the projects through their contributions. The mixed model enables participation in the way most appropriate for each organization or individual: contributing funding as a paying member, contributing directly to project development, or both.

What are some benefits for the ROS ecosystem that we can look forward to through OSRA?

Orsi: We expect the OSRA to benefit the OSRF’s projects in three significant ways.

  • By providing a stable stream of funding to cover the maintenance and development of the ROS ecosystem.
  • By encouraging greater community involvement in development through open processes and open, meritocratic status achievement.
  • By bringing greater community involvement in governance and ensuring that all stakeholders have a voice in decision-making.

Why will this be a good thing for ROS users?

Orsi: The OSRA will ensure that ROS and the suite of open source projects under the stewardship of Open Robotics will continue to be supported and strengthened for years to come. By providing organized governance and oversight, clearer paths to community participation, and financial support, it will provide stability and structure to the projects while enabling continued development.

Along with the development of speech and language technologies, the market for speech-enabled human-robot interactions (HRI) has grown in recent years. However, it is found that people feel their conversational interactions with such robots are far from satisfactory. One of the reasons is the habitability gap, where the usability of a speech-enabled agent drops when its flexibility increases. For social robots, such flexibility is reflected in the diverse choice of robots’ appearances, sounds and behaviours, which shape a robot’s ‘affordance’. Whilst designers or users have enjoyed the freedom of constructing a social robot by integrating off-the-shelf technologies, such freedom comes at a potential cost: the users’ perceptions and satisfaction. Designing appropriate affordances is essential for the quality of HRI. It is hypothesised that a social robot with aligned affordances could create an appropriate perception of the robot and increase users’ satisfaction when speaking with it. Given that previous studies of affordance alignment mainly focus on one interface’s characteristics and face-voice match, we aim to deepen our understanding of affordance alignment with a robot’s behaviours and use cases. In particular, we investigate how a robot’s affordances affect users’ perceptions in different types of use cases. For this purpose, we conducted an exploratory experiment that included three different affordance settings (adult-like, child-like, and robot-like) and three use cases (informative, emotional, and hybrid). Participants were invited to talk to social robots in person. A mixed-methods approach was employed for quantitative and qualitative analysis of 156 interaction samples. The results show that static affordance (face and voice) has a statistically significant effect on the perceived warmth of the first impression; use cases affect people’s perceptions more on perceived competence and warmth before and after interactions. In addition, it shows the importance of aligning static affordance with behavioural affordance. General design principles of behavioural affordances are proposed. We anticipate that our empirical evidence will provide a clearer guideline for speech-enabled social robots’ affordance design. It will be a starting point for more sophisticated design guidelines. For example, personalised affordance design for individual or group users in different contexts.

Introduction: Collaborative robots, designed to work alongside humans for manipulating end-effectors, greatly benefit from the implementation of active constraints. This process comprises the definition of a boundary, followed by the enforcement of some control algorithm when the robot tooltip interacts with the generated boundary. Contact with the constraint boundary is communicated to the human operator through various potential forms of feedback. In fields like surgical robotics, where patient safety is paramount, implementing active constraints can prevent the robot from interacting with portions of the patient anatomy that shouldn’t be operated on. Despite improvements in orthopaedic surgical robots, however, there exists a gap between bulky systems with haptic feedback capabilities and miniaturised systems that only allow for boundary control, where interaction with the active constraint boundary interrupts robot functions. Generally, active constraint generation relies on optical tracking systems and preoperative imaging techniques.

Methods: This paper presents a refined version of the Signature Robot, a three degrees-of-freedom, hands-on collaborative system for orthopaedic surgery. Additionally, it presents a method for generating and enforcing active constraints “on-the-fly” using our previously introduced monocular, RGB, camera-based network, SimPS-Net. The network was deployed in real-time for the purpose of boundary definition. This boundary was subsequently used for constraint enforcement testing. The robot was utilised to test two different active constraints: a safe region and a restricted region.

Results: The network success rate, defined as the ratio of correct over total object localisation results, was calculated to be 54.7% ± 5.2%. In the safe region case, haptic feedback resisted tooltip manipulation beyond the active constraint boundary, with a mean distance from the boundary of 2.70 mm ± 0.37 mm and a mean exit duration of 0.76 s ± 0.11 s. For the restricted-zone constraint, the operator was successfully prevented from penetrating the boundary in 100% of attempts.

Discussion: This paper showcases the viability of the proposed robotic platform and presents promising results of a versatile constraint generation and enforcement pipeline.



About a year ago, Zipline introduced Platform 2, an approach to precision urban drone delivery that combines a large hovering drone with a smaller package-delivery “Droid.” Lowered on a tether from the belly of its parent Zip drone, the Droid contains thrusters and sensors (plus a 2.5- to 3.5-kilogram payload) to reliably navigate itself to a delivery area of just one meter in diameter. The Zip, meanwhile, safely remains hundreds of meters up. After depositing its payload, the Droid rises back up to the drone on its tether, and off they go.

At first glance, the sensor and thruster-packed Droid seems complicated enough to be bordering on impractical, especially when you consider the relative simplicity of other drone delivery solutions, which commonly just drop the package itself on a tether from a hovering drone. I’ve been writing about robots long enough that I’m suspicious of robotic solutions that appear to be overengineered, since that’s always a huge temptation with robotics. Like, is this really the best way of solving a problem, or is it just the coolest way?

We know the folks at Zipline pretty well, though, and they’ve certainly made creative engineering work for them, as we saw when we visited one of their “nests” in rural Rwanda. So as Zipline nears the official launch of Platform 2, we spoke with Zipline cofounder and CTO Keenan Wyrobek, Platform 2 lead Zoltan Laszlo, and industrial designer Gregoire Vandenbussche to understand exactly why they think this is the best way of solving precision urban drone delivery.

First, a quick refresher. Here’s what the delivery sequence with the vertical takeoff and landing (VTOL) Zip and the Droid looks like:

The system has a service radius of about 16 kilometers (10 miles), and it can make deliveries to outdoor spaces of “any meaningful size.” Visual sensors on the Droid find the delivery site and check for obstacles on the way down, while the thrusters compensate for wind and movement of the parent drone. Since the big VTOL Zip remains well out of the way, deliveries are fast, safe, and quiet. But it takes two robots to pull off the delivery rather than just one.

On the other end is the infrastructure required to load and charge these drones. Zipline’s Platform 1 drones require a dedicated base with relatively large launch and recovery systems. With Platform 2, the drone drops the Droid into a large chute attached to the side of a building so that the Droid can be reloaded, after which it pulls the Droid out again and flies off to make the delivery:

“We think it’s the best delivery experience. Not the best drone delivery experience, the best delivery experience,” Zipline’s Wyrobek tells us. That may be true, but the experience also has to be practical and sustainable for Zipline to be successful, so we asked the Zipline team to explain the company’s approach to precision urban delivery.

Zipline on:

IEEE Spectrum: What problems is Platform 2 solving, and why is it necessary to solve those problems in this specific way?

Keenan Wyrobek: There are literally billions of last-mile deliveries happening every year in [the United States] alone, and our customers have been asking for years for something that can deliver to their homes. With our long-range platform, Platform 1, we can float a package down into your yard on a parachute, but that takes some space. And so one half of the big design challenge was how to get our deliveries precise enough, while the other half was to develop a system that will bolt on to existing facilities, which Platform 1 doesn’t do.

Zoltan Laszlo: Platform 1 can deliver within an area of about two parking spaces. As we started to actually look at the data in urban areas using publicly available lidar surveys, we found that two parking spaces serves a bit more than half the market. We want to be a universal delivery service.

But with a delivery area of 1 meter in diameter, which is what we’re actually hitting in our delivery demonstrations for Platform 2, that gets us into the high 90s for the percentage of people that we can deliver to.

Wyrobek: When we say “urban,” what we’re talking about is three-story sprawl, which is common in many large cities around the world. And we wanted to make sure that our deliveries could be precise enough for places like that.

There are some existing solutions for precision aerial delivery that have been operating at scale with some success, typically by winching packages to the ground from a VTOL drone. Why develop your own technique rather than just going with something that has already been shown to work?

Laszlo: Winching down is the natural extension of being able to hover in place, and when we first started, we were like, “Okay, we’re just going to winch down. This will be great, super easy.”

So we went to our test site in Half Moon Bay [on the Northern California coast] and built a quick prototype of a winch system. But as soon as we lowered a box down on the winch, the wind started blowing it all over the place. And this was from the height of our lift, which is less than 10 meters up. We weren’t even able to stay inside two parking spaces, which told us that something was broken with our approach.

The aircraft can sense the wind, so we thought we’d be able to find the right angle for the delivery and things like that. But the wind where the aircraft is may be different from the wind nearer the ground. We realized that unless we’re delivering to an open field, a package that does not have active wind compensation is going to be very hard to control. We’re targeting high-90th percentile in terms of availability due to weather—even if it’s a pretty blustery day, we still want to be able to deliver.

Wyrobek: This was a wild insight when we really understood that unless it’s a perfect day, using a winch actually takes almost as much space as we use for Platform 1 floating a package down on a parachute.

Engineering test footage of Zipline’s Platform 2 docking system at their test site in Half Moon Bay in California.

How did you arrive at this particular delivery solution for Platform 2?

Laszlo: I don’t remember whose idea it was, but we were playing with a bunch of different options. Putting thrusters on the tether wasn’t even the craziest idea. We had our Platform 1 aircraft, which was reliable, so we started with looking at ways to just make that aircraft deliver more precisely. There was only so much more we could do with passive parachutes, but what does an active, steerable parachute look like? There are remote-controlled paragliding toys out there that we tested, with mixed results—the challenge is to minimize the smarts in your parachute, because there’s a chance you won’t get it back. So then we started some crazy brainstorming about how to reliably retrieve the parachute.

Wyrobek: One idea was that the parachute would come with a self-return envelope that you could stick in the mail. Another idea was that the parachute would be steered by a little drone, and when the package got dropped off, the drone would reel the parachute in and then fly back up into the Zip.

Laszlo: But when we realized that the package has to be able to steer itself, that meant the Zip doesn’t need to be active. The Zip doesn’t need to drive the package, it doesn’t even need to see the package, it just needs to be a point up in the sky that’s holding the package. That let us move from having the Zip 50 feet up, to having it 300 feet up, which is important because it’s a big, heavy drone that we don’t want in our customer’s space. And the final step was adding enough smarts to the thing coming down into your space to figure out where exactly to deliver to, and of course to handle the wind.

Once you knew what you needed to do, how did you get to the actual design of the droid?

Gregoire Vandenbussche: Zipline showed me pretty early on that they were ready to try crazy ideas, and from my experience, that’s extremely rare. When the idea of having this controllable tether with a package attached to it came up, one of my first thoughts was that from a user standpoint, nothing like this exists. And the difficulty of designing something that doesn’t exist is that people will try to identify it according to what they know. So we had to find a way to drive that thinking towards something positive.

Early Droid concept sketches by designer Gregoire Vandenbussche featured legs that would fold up after delivery.Zipline

First we thought about putting words onto it, like “hello” or something, but the reality is that we’re an international company and we need to be able to work everywhere. But there’s one thing that’s common to everyone, and that’s emotions—people are able to recognize certain things as being approachable and adorable, so going in that direction felt like the right thing to do. However, being able to design a robot that gives you that kind of emotion but also flies was quite a challenge. We took inspiration from other things that move in 3D, like sea mammals—things that people will recognize even without thinking about it.

Vandenbussche’s sketches show how the design of the Droid was partially inspired by dolphins.Zipline

Now that you say it, I can definitely see the sea mammal inspiration in the drone.

Vandenbussche: There are two aspects of sea mammals that work really well for our purpose. One of them is simplicity of shape; sea mammals don’t have all that many details. Also, they tend to be optimized for performance. Ultimately, we need that, because we need to be able to fly. And we need to be able to convey to people that the drone is under control. So having something you can tell is moving forward or turning or moving away was very helpful.

Wyrobek: One other insight that we had is that Platform 2 needs to be small to fit into tight delivery spaces, and it needs to feel small when it comes into your personal space, but it also has to be big enough inside to be a useful delivery platform. We tried to leverage the chubby but cute look that baby seals have going on.

The design journey was pretty fun. Gregoire would spend two or three days coming up with a hundred different concept sketches. We’d do a bunch of brainstorming, and then Gregoire would come up with a whole bunch of new directions, and we’d keep exploring. To be clear, no one would describe our functional prototypes from back then as “cute.” But through all this iteration eventually we ended up in an awesome place.

And how do you find that place? When do you know that your robot is just cute enough?

One iteration of the Droid, Vandenbussche determined, looked too technical and intimidating.Zipline

Vandenbussche: It’s finding the balance around what’s realistic and functional. I like to think of industrial design as taking all of the constraints and kind of playing Tetris with them until you get a result that ideally satisfies everybody. I remember at one point looking at where we were, and feeling like we were focusing too much on performance and missing that emotional level. So, we went back a little bit to say, where can we bring this back from looking like a highly technical machine to something that can give you a feeling of approachability?

Laszlo: We spent a fair bit of time on the controls and behaviors of the droid to make sure that it moves in a very approachable and predictable way, so that you know where it’s going ahead of time and it doesn’t behave in unexpected ways. That’s pretty important for how people perceive it.

We did a lot of work on how the droid would descend and approach the delivery site. One concept had the droid start to lower down well before the Zip was hovering directly overhead. We had simulations and renderings, and it looked great. We could do the whole delivery in barely over 20 seconds. But even if the package is far away from you, it still looks scary because [the Zip is] moving faster than you would expect, and you can’t tell exactly where it’s going to deliver. So we deleted all that code, and now it just comes straight down, and people don’t back away from the Droid anymore. They’re just like, “Oh, okay, cool.”

How did you design the thrusters to enable these pinpoint deliveries?

Early tests of the Droid centered around a two-fan version.Zipline

Laszlo: With the thrusters, we knew we wanted to maximize the size of at least one of the fans, because we were almost always going to have to deal with wind. We’re trying to be as quiet as we can, so the key there is to maximize the area of the propeller. Our leading early design was just a box with two fans on it:

Two fans with unobstructed flow meant that it moved great, but the challenge of fitting it inside another aircraft was going to be painful. And it looked big, even though it wasn’t actually that big.

Vandenbussche: It was also pretty intimidating when you had those two fans facing you and the Droid coming toward you.

A single steerable fan [left] that acted like a rudder was simpler in some ways, but as the fan got larger, the gyroscopic effects became hard to manage. Instead of one steerable fan, how about two steerable fans? [right] Omnidirectional motion was possible with this setup, but packaging it inside of a Zip didn’t work.Zipline

Laszlo: We then started looking at configurations with a main fan and a second smaller fan, with the bigger fan at the back pushing forward and the smaller fan at the front providing thrust for turning. The third fan we added relatively late because we didn’t want to add it at all. But we found that [with two fans] the droid would have to spin relatively quickly to align to shifting winds, whereas with a third fan we can just push sideways in the direction that we need.

What kind of intelligence does the Droid have?

The current design of Zipline’s Platform 2 Droid is built around a large thruster in the rear and two smaller thrusters at the front and back.Zipline

Wyrobek: The Droid has its own little autopilot, and there’s a very simple communications system between the two vehicles. You may think that it’s a really complex coordinated control problem, but it’s not: The Zip just kind of hangs out, and the Droid takes care of the delivery. The sensing challenge is for the Droid to find trees and powerlines and things like that, and then find a good delivery site.

Was there ever a point at which you were concerned that the size and weight and complexity would not be worth it?

Wyrobek: Our mindset was to fail fast, to try things and do what we needed to do to convince ourselves that it wasn’t a good path. What’s fun about this kind of iterative process is oftentimes, you try things and you realize that actually, this is better than we thought.

Laszlo: We first thought about the Droid as a little bit of a tax, in that it’s costing us extra weight. But if your main drone can stay high enough up that it avoids trees and buildings, then it can just float around up there. If it gets pushed around by the wind, it doesn’t matter because the Droid can compensate.

Wyrobek: Keeping the Zip at altitude is a big win in many ways. It doesn’t have to spend energy station-keeping, descending, and then ascending again. We just do that with the much smaller Droid, which also makes the hovering phase much shorter. It’s also much more efficient to control the small droid than the large Zip. And having all of the sensors on the Droid very close to the area that you’re delivering to makes that problem easier as well. It may look like a more complex system from the outside, but from the inside, it’s basically making all the hardest problems much easier.

Over the past year, Zipline has set up a bunch of partnerships to make residential deliveries to consumers using Droid starting in 2024, including prescriptions from Cleveland Clinic in Ohio, medical products from WellSpan Health in Pennsylvania, tasty food from Mendocino Farms in California, and a little bit of everything from Walmart starting in Dallas. Zipline’s plan is to kick things off with Platform 2 later this year.



About a year ago, Zipline introduced Platform 2, an approach to precision urban drone delivery that combines a large hovering drone with a smaller package-delivery “Droid.” Lowered on a tether from the belly of its parent Zip drone, the Droid contains thrusters and sensors (plus a 2.5- to 3.5-kilogram payload) to reliably navigate itself to a delivery area of just one meter in diameter. The Zip, meanwhile, safely remains hundreds of meters up. After depositing its payload, the Droid rises back up to the drone on its tether, and off they go.

At first glance, the sensor and thruster-packed Droid seems complicated enough to be bordering on impractical, especially when you consider the relative simplicity of other drone delivery solutions, which commonly just drop the package itself on a tether from a hovering drone. I’ve been writing about robots long enough that I’m suspicious of robotic solutions that appear to be overengineered, since that’s always a huge temptation with robotics. Like, is this really the best way of solving a problem, or is it just the coolest way?

We know the folks at Zipline pretty well, though, and they’ve certainly made creative engineering work for them, as we saw when we visited one of their “nests” in rural Rwanda. So as Zipline nears the official launch of Platform 2, we spoke with Zipline cofounder and CTO Keenan Wyrobek, Platform 2 lead Zoltan Laszlo, and industrial designer Gregoire Vandenbussche to understand exactly why they think this is the best way of solving precision urban drone delivery.

First, a quick refresher. Here’s what the delivery sequence with the vertical takeoff and landing (VTOL) Zip and the Droid looks like:

The system has a service radius of about 16 kilometers (10 miles), and it can make deliveries to outdoor spaces of “any meaningful size.” Visual sensors on the Droid find the delivery site and check for obstacles on the way down, while the thrusters compensate for wind and movement of the parent drone. Since the big VTOL Zip remains well out of the way, deliveries are fast, safe, and quiet. But it takes two robots to pull off the delivery rather than just one.

On the other end is the infrastructure required to load and charge these drones. Zipline’s Platform 1 drones require a dedicated base with relatively large launch and recovery systems. With Platform 2, the drone drops the Droid into a large chute attached to the side of a building so that the Droid can be reloaded, after which it pulls the Droid out again and flies off to make the delivery:

“We think it’s the best delivery experience. Not the best drone delivery experience, the best delivery experience,” Zipline’s Wyrobek tells us. That may be true, but the experience also has to be practical and sustainable for Zipline to be successful, so we asked the Zipline team to explain the company’s approach to precision urban delivery.

Zipline on:

IEEE Spectrum: What problems is Platform 2 solving, and why is it necessary to solve those problems in this specific way?

Keenan Wyrobek: There are literally billions of last-mile deliveries happening every year in [the United States] alone, and our customers have been asking for years for something that can deliver to their homes. With our long-range platform, Platform 1, we can float a package down into your yard on a parachute, but that takes some space. And so one half of the big design challenge was how to get our deliveries precise enough, while the other half was to develop a system that will bolt on to existing facilities, which Platform 1 doesn’t do.

Zoltan Laszlo: Platform 1 can deliver within an area of about two parking spaces. As we started to actually look at the data in urban areas using publicly available lidar surveys, we found that two parking spaces serves a bit more than half the market. We want to be a universal delivery service.

But with a delivery area of 1 meter in diameter, which is what we’re actually hitting in our delivery demonstrations for Platform 2, that gets us into the high 90s for the percentage of people that we can deliver to.

Wyrobek: When we say “urban,” what we’re talking about is three-story sprawl, which is common in many large cities around the world. And we wanted to make sure that our deliveries could be precise enough for places like that.

There are some existing solutions for precision aerial delivery that have been operating at scale with some success, typically by winching packages to the ground from a VTOL drone. Why develop your own technique rather than just going with something that has already been shown to work?

Laszlo: Winching down is the natural extension of being able to hover in place, and when we first started, we were like, “Okay, we’re just going to winch down. This will be great, super easy.”

So we went to our test site in Half Moon Bay [on the Northern California coast] and built a quick prototype of a winch system. But as soon as we lowered a box down on the winch, the wind started blowing it all over the place. And this was from the height of our lift, which is less than 10 meters up. We weren’t even able to stay inside two parking spaces, which told us that something was broken with our approach.

The aircraft can sense the wind, so we thought we’d be able to find the right angle for the delivery and things like that. But the wind where the aircraft is may be different from the wind nearer the ground. We realized that unless we’re delivering to an open field, a package that does not have active wind compensation is going to be very hard to control. We’re targeting high-90th percentile in terms of availability due to weather—even if it’s a pretty blustery day, we still want to be able to deliver.

Wyrobek: This was a wild insight when we really understood that unless it’s a perfect day, using a winch actually takes almost as much space as we use for Platform 1 floating a package down on a parachute.

Engineering test footage of Zipline’s Platform 2 docking system at their test site in Half Moon Bay in California.

How did you arrive at this particular delivery solution for Platform 2?

Laszlo: I don’t remember whose idea it was, but we were playing with a bunch of different options. Putting thrusters on the tether wasn’t even the craziest idea. We had our Platform 1 aircraft, which was reliable, so we started with looking at ways to just make that aircraft deliver more precisely. There was only so much more we could do with passive parachutes, but what does an active, steerable parachute look like? There are remote-controlled paragliding toys out there that we tested, with mixed results—the challenge is to minimize the smarts in your parachute, because there’s a chance you won’t get it back. So then we started some crazy brainstorming about how to reliably retrieve the parachute.

Wyrobek: One idea was that the parachute would come with a self-return envelope that you could stick in the mail. Another idea was that the parachute would be steered by a little drone, and when the package got dropped off, the drone would reel the parachute in and then fly back up into the Zip.

Laszlo: But when we realized that the package has to be able to steer itself, that meant the Zip doesn’t need to be active. The Zip doesn’t need to drive the package, it doesn’t even need to see the package, it just needs to be a point up in the sky that’s holding the package. That let us move from having the Zip 50 feet up, to having it 300 feet up, which is important because it’s a big, heavy drone that we don’t want in our customer’s space. And the final step was adding enough smarts to the thing coming down into your space to figure out where exactly to deliver to, and of course to handle the wind.

Once you knew what you needed to do, how did you get to the actual design of the droid?

Gregoire Vandenbussche: Zipline showed me pretty early on that they were ready to try crazy ideas, and from my experience, that’s extremely rare. When the idea of having this controllable tether with a package attached to it came up, one of my first thoughts was that from a user standpoint, nothing like this exists. And the difficulty of designing something that doesn’t exist is that people will try to identify it according to what they know. So we had to find a way to drive that thinking towards something positive.

Early Droid concept sketches by designer Gregoire Vandenbussche featured legs that would fold up after delivery.Zipline

First we thought about putting words onto it, like “hello” or something, but the reality is that we’re an international company and we need to be able to work everywhere. But there’s one thing that’s common to everyone, and that’s emotions—people are able to recognize certain things as being approachable and adorable, so going in that direction felt like the right thing to do. However, being able to design a robot that gives you that kind of emotion but also flies was quite a challenge. We took inspiration from other things that move in 3D, like sea mammals—things that people will recognize even without thinking about it.

Vandenbussche’s sketches show how the design of the Droid was partially inspired by dolphins.Zipline

Now that you say it, I can definitely see the sea mammal inspiration in the drone.

Vandenbussche: There are two aspects of sea mammals that work really well for our purpose. One of them is simplicity of shape; sea mammals don’t have all that many details. Also, they tend to be optimized for performance. Ultimately, we need that, because we need to be able to fly. And we need to be able to convey to people that the drone is under control. So having something you can tell is moving forward or turning or moving away was very helpful.

Wyrobek: One other insight that we had is that Platform 2 needs to be small to fit into tight delivery spaces, and it needs to feel small when it comes into your personal space, but it also has to be big enough inside to be a useful delivery platform. We tried to leverage the chubby but cute look that baby seals have going on.

The design journey was pretty fun. Gregoire would spend two or three days coming up with a hundred different concept sketches. We’d do a bunch of brainstorming, and then Gregoire would come up with a whole bunch of new directions, and we’d keep exploring. To be clear, no one would describe our functional prototypes from back then as “cute.” But through all this iteration eventually we ended up in an awesome place.

And how do you find that place? When do you know that your robot is just cute enough?

One iteration of the Droid, Vandenbussche determined, looked too technical and intimidating.Zipline

Vandenbussche: It’s finding the balance around what’s realistic and functional. I like to think of industrial design as taking all of the constraints and kind of playing Tetris with them until you get a result that ideally satisfies everybody. I remember at one point looking at where we were, and feeling like we were focusing too much on performance and missing that emotional level. So, we went back a little bit to say, where can we bring this back from looking like a highly technical machine to something that can give you a feeling of approachability?

Laszlo: We spent a fair bit of time on the controls and behaviors of the droid to make sure that it moves in a very approachable and predictable way, so that you know where it’s going ahead of time and it doesn’t behave in unexpected ways. That’s pretty important for how people perceive it.

We did a lot of work on how the droid would descend and approach the delivery site. One concept had the droid start to lower down well before the Zip was hovering directly overhead. We had simulations and renderings, and it looked great. We could do the whole delivery in barely over 20 seconds. But even if the package is far away from you, it still looks scary because [the Zip is] moving faster than you would expect, and you can’t tell exactly where it’s going to deliver. So we deleted all that code, and now it just comes straight down, and people don’t back away from the Droid anymore. They’re just like, “Oh, okay, cool.”

How did you design the thrusters to enable these pinpoint deliveries?

Early tests of the Droid centered around a two-fan version.Zipline

Laszlo: With the thrusters, we knew we wanted to maximize the size of at least one of the fans, because we were almost always going to have to deal with wind. We’re trying to be as quiet as we can, so the key there is to maximize the area of the propeller. Our leading early design was just a box with two fans on it:

Two fans with unobstructed flow meant that it moved great, but the challenge of fitting it inside another aircraft was going to be painful. And it looked big, even though it wasn’t actually that big.

Vandenbussche: It was also pretty intimidating when you had those two fans facing you and the Droid coming toward you.

A single steerable fan [left] that acted like a rudder was simpler in some ways, but as the fan got larger, the gyroscopic effects became hard to manage. Instead of one steerable fan, how about two steerable fans? [right] Omnidirectional motion was possible with this setup, but packaging it inside of a Zip didn’t work.Zipline

Laszlo: We then started looking at configurations with a main fan and a second smaller fan, with the bigger fan at the back pushing forward and the smaller fan at the front providing thrust for turning. The third fan we added relatively late because we didn’t want to add it at all. But we found that [with two fans] the droid would have to spin relatively quickly to align to shifting winds, whereas with a third fan we can just push sideways in the direction that we need.

What kind of intelligence does the Droid have?

The current design of Zipline’s Platform 2 Droid is built around a large thruster in the rear and two smaller thrusters at the front and back.Zipline

Wyrobek: The Droid has its own little autopilot, and there’s a very simple communications system between the two vehicles. You may think that it’s a really complex coordinated control problem, but it’s not: The Zip just kind of hangs out, and the Droid takes care of the delivery. The sensing challenge is for the Droid to find trees and powerlines and things like that, and then find a good delivery site.

Was there ever a point at which you were concerned that the size and weight and complexity would not be worth it?

Wyrobek: Our mindset was to fail fast, to try things and do what we needed to do to convince ourselves that it wasn’t a good path. What’s fun about this kind of iterative process is oftentimes, you try things and you realize that actually, this is better than we thought.

Laszlo: We first thought about the Droid as a little bit of a tax, in that it’s costing us extra weight. But if your main drone can stay high enough up that it avoids trees and buildings, then it can just float around up there. If it gets pushed around by the wind, it doesn’t matter because the Droid can compensate.

Wyrobek: Keeping the Zip at altitude is a big win in many ways. It doesn’t have to spend energy station-keeping, descending, and then ascending again. We just do that with the much smaller Droid, which also makes the hovering phase much shorter. It’s also much more efficient to control the small droid than the large Zip. And having all of the sensors on the Droid very close to the area that you’re delivering to makes that problem easier as well. It may look like a more complex system from the outside, but from the inside, it’s basically making all the hardest problems much easier.

Over the past year, Zipline has set up a bunch of partnerships to make residential deliveries to consumers using Droid starting in 2024, including prescriptions from Cleveland Clinic in Ohio, medical products from WellSpan Health in Pennsylvania, tasty food from Mendocino Farms in California, and a little bit of everything from Walmart starting in Dallas. Zipline’s plan is to kick things off with Platform 2 later this year.



Legendary MIT roboticist Daniela Rus has published a new book called The Heart and the Chip: Our Bright Future with Robots. “There is a robotics revolution underway,” Rus says in the book’s introduction, “one that is already causing massive changes in our society and in our lives.” She’s quite right, of course, and although some of us have been feeling that this is true for decades, it’s arguably more true right now than it ever has been. But robots are difficult and complicated, and the way that their progress is intertwined with the humans that make them and work with them means that these changes won’t come quickly or easily. Rus’ experience gives her a deep and nuanced perspective on robotics’ past and future, and we’re able to share a little bit of that with you here.

Daniela Rus: Should roboticists consider subscribing to their own Hippocratic oath?

The following excerpt is from Chapter 14, entitled “What Could Go Wrong?” Which, let’s be honest, is the right question to ask (and then attempt to conclusively answer) whenever you’re thinking about sending a robot out into the real world.

At several points in this book I’ve mentioned the fictional character Tony Stark, who uses technology to transform himself into the superhero Iron Man. To me this character is a tremendous inspiration, yet I often remind myself that in the story, he begins his career as an MIT-­trained weapons manufacturer and munitions developer. In the 2008 film Iron Man, he changes his ways because he learns that his company’s specialized weapons are being used by terrorists.

Remember, robots are tools. Inherently, they are neither good nor bad; it’s how we choose to use them that matters. In 2022, aerial drones were used as weapons on both sides of devastating wars. Anyone can purchase a drone, but there are regulations for using drones that vary between and within different countries. In the United States, the Federal Aviation Administration requires that all drones be registered, with a few exceptions, including toy models weighing less than 250 grams. The rules also depend on whether the drone is flown for fun or for business. Regardless of regulations, anyone could use a flying robot to inflict harm, just like anyone can swing a hammer to hurt someone instead of driving a nail into a board. Yet drones are also being used to deliver critical medical supplies in hard-­to-­reach areas, track the health of forests, and help scientists like Roger Payne monitor and advocate for at-­risk species. My group collaborated with the modern dance company Pilobolus to stage the first theatrical performance featuring a mix of humans and drones back in 2012, with a robot called Seraph. So, drones can be dancers, too. In Kim Stanley Robinson’s prescient science fiction novel The Ministry for the Future, a swarm of unmanned aerial vehicles is deployed to crash an airliner. I can imagine a flock of these mechanical birds being used in many good ways, too. At the start of its war against Ukraine, Russia limited its citizens’ access to unbiased news and information in hopes of controlling and shaping the narrative around the conflict. The true story of the invasion was stifled, and I wondered whether we could have dispatched a swarm of flying video screens capable of arranging themselves into one giant aerial monitor in the middle of popular city squares across Russia, showing real footage of the war, not merely clips approved by the government. Or, even simpler: swarms of flying digital projectors could have broadcasted the footage on the sides of buildings and walls for all to see. If we had deployed enough, there would have been too many of them to shut down.

There may be variations of Tony Stark passing through my university or the labs of my colleagues around the world, and we need to do whatever we can to ensure these talented young individuals endeavor to have a positive impact on humanity.

The Tony Stark character is shaped by his experiences and steered toward having a positive impact on the world, but we cannot wait for all of our technologists to endure harrowing, life-­changing experiences. Nor can we expect everyone to use these intelligent machines for good once they are developed and moved out into circulation. Yet that doesn’t mean we should stop working on these technologies—­the potential benefits are too great. What we can do is think harder about the consequences and put in place the guardrails to ensure positive benefits. My contemporaries and I can’t necessarily control how these tools are used in the world, but we can do more to influence the people making them.

There may be variations of Tony Stark passing through my university or the labs of my colleagues around the world, and we need to do whatever we can to ensure these talented young individuals endeavor to have a positive impact on humanity. We absolutely must have diversity in our university labs and research centers, but we may be able to do more to shape the young people who study with us. For example, we could require study of the Manhattan Project and the moral and ethical quandaries associated with the phenomenal effort to build and use the atomic bomb. At this point, ethics courses are not a widespread requirement for an advanced degree in robotics or AI, but perhaps they should be. Or why not require graduates to swear to a robotics-­ and AI-­attuned variation on the Hippocratic oath?

The oath comes from an early Greek medical text, which may or may not have been written by the philosopher Hippocrates, and it has evolved over the centuries. Fundamentally, it represents a standard of medical ethics to which doctors are expected to adhere. The most famous of these is the promise to do no harm, or to avoid intentional wrongdoing. I also applaud the oath’s focus on committing to the community of doctors and the necessity of maintaining the sacred bond between teacher and pupils. The more we remain linked as a robotics community, the more we foster and maintain our relationships as our students move out into the world, the more we can do to steer the technology toward a positive future. Today the Hippocratic oath is not a universal requirement for certification as a doctor, and I do not see it functioning that way for roboticists, either. Nor am I the first roboticist or AI leader to suggest this possibility. But we should seriously consider making it standard practice.

In the aftermath of the development of the atomic bomb, when the potential of scientists to do harm was made suddenly and terribly evident, there was some discussion of a Hippocratic oath for scientific researchers. The idea has resurfaced from time to time and rarely gains traction. But science is fundamentally about the pursuit of knowledge; in that sense it is pure. In robotics and AI, we are building things that will have an impact on the world and its people and other forms of life. In this sense, our field is somewhat closer to medicine, as doctors are using their training to directly impact the lives of individuals. Asking technologists to formally recite a version of the Hippocratic oath could be a way to continue nudging our field in the right direction, and perhaps serve as a check on individuals who are later asked to develop robots or AI expressly for nefarious purposes.

Of course, the very idea of what is good or bad, in terms of how a robot is used, depends on where you sit. I am steadfastly opposed to giving armed or weaponized robots autonomy. We cannot and should not trust machine intelligences to make decisions about whether to inflict harm on a person or group of people on their own. Personally, I would prefer that robots never be used to do harm to anyone, but this is now unrealistic. Robots are being used as tools of war, and it is our responsibility to do whatever we can to shape their ethical use. So, I do not separate or divorce myself from reality and operate solely in some utopian universe of happy, helpful robots. In fact, I teach courses on artificial intelligence to national security officials and advise them on the strengths, weaknesses, and capabilities of the technology. I see this as a patriotic duty, and I’m honored to be helping our leaders understand the limitations, strengths, and possibilities of robots and other AI-­enhanced physical systems—­what they can and cannot do, what they should and should not do, and what I believe they must do.

Ultimately, no matter how much we teach and preach about the limitations of technology, the ethics of AI, or the potential dangers of developing such powerful tools, people will make their own choices, whether they are recently graduated students or senior national security leaders. What I hope and teach is that we should choose to do good. Despite the efforts of life extension companies, we all have a limited time on this planet, what the scientist Carl Sagan called our “pale blue dot,” and we should do whatever we can to make the most of that time and have a positive impact on our beautiful environment, and the many people and other species with which we share it. My decades-­long quest to build more intelligent and capable robots has only strengthened my appreciation for—­no, wonder at—­the marvelous creatures that crawl, walk, swim, run, slither, and soar across and around our planet, and the fantastic plants, too. We should not busy ourselves with the work of developing robots that can eliminate these cosmically rare creations. We should focus instead on building technologies to preserve them, and even help them thrive. That applies to all living entities, including the one species that is especially concerned about the rise of intelligent machines.


Excerpted from “The Heart and the Chip: Our Bright Future with Robots”. Copyright 2024 by Daniela Rus, Gregory Mone. Used with permission of the publisher, W.W. Norton & Company. All rights reserved.



Legendary MIT roboticist Daniela Rus has published a new book called The Heart and the Chip: Our Bright Future with Robots. “There is a robotics revolution underway,” Rus says in the book’s introduction, “one that is already causing massive changes in our society and in our lives.” She’s quite right, of course, and although some of us have been feeling that this is true for decades, it’s arguably more true right now than it ever has been. But robots are difficult and complicated, and the way that their progress is intertwined with the humans that make them and work with them means that these changes won’t come quickly or easily. Rus’ experience gives her a deep and nuanced perspective on robotics’ past and future, and we’re able to share a little bit of that with you here.

Daniela Rus: Should roboticists consider subscribing to their own Hippocratic oath?

The following excerpt is from Chapter 14, entitled “What Could Go Wrong?” Which, let’s be honest, is the right question to ask (and then attempt to conclusively answer) whenever you’re thinking about sending a robot out into the real world.

At several points in this book I’ve mentioned the fictional character Tony Stark, who uses technology to transform himself into the superhero Iron Man. To me this character is a tremendous inspiration, yet I often remind myself that in the story, he begins his career as an MIT-­trained weapons manufacturer and munitions developer. In the 2008 film Iron Man, he changes his ways because he learns that his company’s specialized weapons are being used by terrorists.

Remember, robots are tools. Inherently, they are neither good nor bad; it’s how we choose to use them that matters. In 2022, aerial drones were used as weapons on both sides of devastating wars. Anyone can purchase a drone, but there are regulations for using drones that vary between and within different countries. In the United States, the Federal Aviation Administration requires that all drones be registered, with a few exceptions, including toy models weighing less than 250 grams. The rules also depend on whether the drone is flown for fun or for business. Regardless of regulations, anyone could use a flying robot to inflict harm, just like anyone can swing a hammer to hurt someone instead of driving a nail into a board. Yet drones are also being used to deliver critical medical supplies in hard-­to-­reach areas, track the health of forests, and help scientists like Roger Payne monitor and advocate for at-­risk species. My group collaborated with the modern dance company Pilobolus to stage the first theatrical performance featuring a mix of humans and drones back in 2012, with a robot called Seraph. So, drones can be dancers, too. In Kim Stanley Robinson’s prescient science fiction novel The Ministry for the Future, a swarm of unmanned aerial vehicles is deployed to crash an airliner. I can imagine a flock of these mechanical birds being used in many good ways, too. At the start of its war against Ukraine, Russia limited its citizens’ access to unbiased news and information in hopes of controlling and shaping the narrative around the conflict. The true story of the invasion was stifled, and I wondered whether we could have dispatched a swarm of flying video screens capable of arranging themselves into one giant aerial monitor in the middle of popular city squares across Russia, showing real footage of the war, not merely clips approved by the government. Or, even simpler: swarms of flying digital projectors could have broadcasted the footage on the sides of buildings and walls for all to see. If we had deployed enough, there would have been too many of them to shut down.

There may be variations of Tony Stark passing through my university or the labs of my colleagues around the world, and we need to do whatever we can to ensure these talented young individuals endeavor to have a positive impact on humanity.

The Tony Stark character is shaped by his experiences and steered toward having a positive impact on the world, but we cannot wait for all of our technologists to endure harrowing, life-­changing experiences. Nor can we expect everyone to use these intelligent machines for good once they are developed and moved out into circulation. Yet that doesn’t mean we should stop working on these technologies—­the potential benefits are too great. What we can do is think harder about the consequences and put in place the guardrails to ensure positive benefits. My contemporaries and I can’t necessarily control how these tools are used in the world, but we can do more to influence the people making them.

There may be variations of Tony Stark passing through my university or the labs of my colleagues around the world, and we need to do whatever we can to ensure these talented young individuals endeavor to have a positive impact on humanity. We absolutely must have diversity in our university labs and research centers, but we may be able to do more to shape the young people who study with us. For example, we could require study of the Manhattan Project and the moral and ethical quandaries associated with the phenomenal effort to build and use the atomic bomb. At this point, ethics courses are not a widespread requirement for an advanced degree in robotics or AI, but perhaps they should be. Or why not require graduates to swear to a robotics-­ and AI-­attuned variation on the Hippocratic oath?

The oath comes from an early Greek medical text, which may or may not have been written by the philosopher Hippocrates, and it has evolved over the centuries. Fundamentally, it represents a standard of medical ethics to which doctors are expected to adhere. The most famous of these is the promise to do no harm, or to avoid intentional wrongdoing. I also applaud the oath’s focus on committing to the community of doctors and the necessity of maintaining the sacred bond between teacher and pupils. The more we remain linked as a robotics community, the more we foster and maintain our relationships as our students move out into the world, the more we can do to steer the technology toward a positive future. Today the Hippocratic oath is not a universal requirement for certification as a doctor, and I do not see it functioning that way for roboticists, either. Nor am I the first roboticist or AI leader to suggest this possibility. But we should seriously consider making it standard practice.

In the aftermath of the development of the atomic bomb, when the potential of scientists to do harm was made suddenly and terribly evident, there was some discussion of a Hippocratic oath for scientific researchers. The idea has resurfaced from time to time and rarely gains traction. But science is fundamentally about the pursuit of knowledge; in that sense it is pure. In robotics and AI, we are building things that will have an impact on the world and its people and other forms of life. In this sense, our field is somewhat closer to medicine, as doctors are using their training to directly impact the lives of individuals. Asking technologists to formally recite a version of the Hippocratic oath could be a way to continue nudging our field in the right direction, and perhaps serve as a check on individuals who are later asked to develop robots or AI expressly for nefarious purposes.

Of course, the very idea of what is good or bad, in terms of how a robot is used, depends on where you sit. I am steadfastly opposed to giving armed or weaponized robots autonomy. We cannot and should not trust machine intelligences to make decisions about whether to inflict harm on a person or group of people on their own. Personally, I would prefer that robots never be used to do harm to anyone, but this is now unrealistic. Robots are being used as tools of war, and it is our responsibility to do whatever we can to shape their ethical use. So, I do not separate or divorce myself from reality and operate solely in some utopian universe of happy, helpful robots. In fact, I teach courses on artificial intelligence to national security officials and advise them on the strengths, weaknesses, and capabilities of the technology. I see this as a patriotic duty, and I’m honored to be helping our leaders understand the limitations, strengths, and possibilities of robots and other AI-­enhanced physical systems—­what they can and cannot do, what they should and should not do, and what I believe they must do.

Ultimately, no matter how much we teach and preach about the limitations of technology, the ethics of AI, or the potential dangers of developing such powerful tools, people will make their own choices, whether they are recently graduated students or senior national security leaders. What I hope and teach is that we should choose to do good. Despite the efforts of life extension companies, we all have a limited time on this planet, what the scientist Carl Sagan called our “pale blue dot,” and we should do whatever we can to make the most of that time and have a positive impact on our beautiful environment, and the many people and other species with which we share it. My decades-­long quest to build more intelligent and capable robots has only strengthened my appreciation for—­no, wonder at—­the marvelous creatures that crawl, walk, swim, run, slither, and soar across and around our planet, and the fantastic plants, too. We should not busy ourselves with the work of developing robots that can eliminate these cosmically rare creations. We should focus instead on building technologies to preserve them, and even help them thrive. That applies to all living entities, including the one species that is especially concerned about the rise of intelligent machines.


Excerpted from “The Heart and the Chip: Our Bright Future with Robots”. Copyright 2024 by Daniela Rus, Gregory Mone. Used with permission of the publisher, W.W. Norton & Company. All rights reserved.



Your weekly selection of awesome robot videos

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLO.Eurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPAN

Enjoy today’s videos!

How many quadrupeds can you control with one single locomotion policy? Apparently, the answer is “all of the quadrupeds.”

Look for this at ICRA 2024 in a couple of months!

[ EPFL ]

Thanks, Milad!

Very impressive performance from Figure 01, I think, although as is frequently the case, it’s hard to tell exactly how impressive without more information about exactly what’s going on here.

[ Figure ]

That awesome ANYmal Parkour research is now published, which means that there’s a new video, well worth watching all the way to the end.

[ Science ] via [ ETHZ RSL ]

Robotic vision can be pretty tricky when you’re cooking, because things can significantly change how they look over time, like with melting butter or an egg being fried. Some new research is tackling this, using a (now ancient?) PR2.

[ JSK Lab ]

Thanks, Kento!

Filmed in January of 2020, this video shows Atlas clearing debris and going through a doorway. Uses a combination of simple footstep planning, teleoperation, and autonomous behaviors through a single virtual reality operator interface. Robot built by Boston Dynamics for the DARPA Robotics Challenge in 2013.

[ IHMC ]

Sustainable fashion enabled by smart textiles shaped by a robot and a heat gun. Multiple styles, multiple sizes, all in one garment!

[ MIT ]

Video of Boston Dynamics’ Stretch from MODEX, with a little sneak peak at the end of what the robot’s next warehouse task might be.

[ Boston Dynamics ]

Pickle Robots autonomously unload trucks and import containers. The system is in production use at customer warehoues handling floor-loaded freight at human scale or better.

[ Pickle Robot ]

The ROBDEKON robotics competence center is dedicated to the development of robotic systems for hazardous environments that pose a potential risk to humans. As part of the consortium, the FZI Research Center for Information Technology developed robotic systems, technologies, and Artificial Intelligence (AI) methods that can be used to handle hazardous materials–for example, to sort potentially dangerous used batteries for recycling.

[ FZI ]

This research project with Ontario Power Generation involves adapting Boston Dynamics Spot’s localization system to longterm changes in the environment. During this testing, we mounted a GoPro camera on the back of Spot and took a video of each walk for a year from Spot’s point of view. We put the footage together as a moving time-lapse video where the day changes as Spot completes the Autowalk around the campus.

[ MARS Lab ]



Your weekly selection of awesome robot videos

Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLO.Eurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPAN

Enjoy today’s videos!

How many quadrupeds can you control with one single locomotion policy? Apparently, the answer is “all of the quadrupeds.”

Look for this at ICRA 2024 in a couple of months!

[ EPFL ]

Thanks, Milad!

Very impressive performance from Figure 01, I think, although as is frequently the case, it’s hard to tell exactly how impressive without more information about exactly what’s going on here.

[ Figure ]

That awesome ANYmal Parkour research is now published, which means that there’s a new video, well worth watching all the way to the end.

[ Science ] via [ ETHZ RSL ]

Robotic vision can be pretty tricky when you’re cooking, because things can significantly change how they look over time, like with melting butter or an egg being fried. Some new research is tackling this, using a (now ancient?) PR2.

[ JSK Lab ]

Thanks, Kento!

Filmed in January of 2020, this video shows Atlas clearing debris and going through a doorway. Uses a combination of simple footstep planning, teleoperation, and autonomous behaviors through a single virtual reality operator interface. Robot built by Boston Dynamics for the DARPA Robotics Challenge in 2013.

[ IHMC ]

Sustainable fashion enabled by smart textiles shaped by a robot and a heat gun. Multiple styles, multiple sizes, all in one garment!

[ MIT ]

Video of Boston Dynamics’ Stretch from MODEX, with a little sneak peak at the end of what the robot’s next warehouse task might be.

[ Boston Dynamics ]

Pickle Robots autonomously unload trucks and import containers. The system is in production use at customer warehoues handling floor-loaded freight at human scale or better.

[ Pickle Robot ]

The ROBDEKON robotics competence center is dedicated to the development of robotic systems for hazardous environments that pose a potential risk to humans. As part of the consortium, the FZI Research Center for Information Technology developed robotic systems, technologies, and Artificial Intelligence (AI) methods that can be used to handle hazardous materials–for example, to sort potentially dangerous used batteries for recycling.

[ FZI ]

This research project with Ontario Power Generation involves adapting Boston Dynamics Spot’s localization system to longterm changes in the environment. During this testing, we mounted a GoPro camera on the back of Spot and took a video of each walk for a year from Spot’s point of view. We put the footage together as a moving time-lapse video where the day changes as Spot completes the Autowalk around the campus.

[ MARS Lab ]

Robots have tremendous potential, and have recently been introduced not only for simple operations in factories, but also in workplaces where customer service communication is required. However, communication robots have not always been accepted. This study proposes a three-stage (first contact, interaction, and decision) model for robot acceptance based on the human cognitive process flow to design preferred robots and clarifies the elements of the robot and the processes that affect robot acceptance decision-making. Unlike previous robot acceptance models, the current model focuses on a sequential account of how people decide to accept, considering the interaction (or carry-over) effect between impressions established at each stage. According to the model, this study conducted a scenario-based experiment focusing on the impression of the first contact (a robot’s appearance) and that formed during the interaction with robot (politeness of its conversation and behavior) on robot acceptance in both successful and slightly failed situations. The better the appearance of the robot and the more polite its behavior, the greater the acceptance rate. Importantly, there was no interaction between these two factors. The results indicating that the impressions of the first contact and interaction are additively processed suggest that we should accumulate findings that improving the appearance of the robot and making its communication behavior more human-like in politeness will lead to a more acceptable robot design.

Actuator failure on a remotely deployed robot results in decreased efficiency or even renders it inoperable. Robustness to these failures will become critical as robots are required to be more independent and operate out of the range of repair. To address these challenges, we present two approaches based on modular robotic architecture to improve robustness to actuator failure of both fixed-configuration robots and modular reconfigurable robots. Our work uses modular reconfigurable robots capable of modifying their style of locomotion and changing their designed morphology through ejecting modules. This framework improved the distance travelled and decreased the effort to move through the environment of simulated and physical robots. When the deployed robot was allowed to change its locomotion style, it showed improved robustness to actuator failure when compared to a robot with a fixed controller. Furthermore, a robot capable of changing its locomotion and design morphology statistically outlasted both tests with a fixed morphology. Testing was carried out using a gazebo simulation and validated in multiple tests in the field. We show for the first time that ejecting modular failed components can improve the overall mission length.

One of the greatest challenges to the automated production of goods is equipment malfunction. Ideally, machines should be able to automatically predict and detect operational faults in order to minimize downtime and plan for timely maintenance. While traditional condition-based maintenance (CBM) involves costly sensor additions and engineering, machine learning approaches offer the potential to learn from already existing sensors. Implementations of data-driven CBM typically use supervised and semi-supervised learning to classify faults. In addition to a large collection of operation data, records of faulty operation are also necessary, which are often costly to obtain. Instead of classifying faults, we use an approach to detect abnormal behaviour within the machine’s operation. This approach is analogous to semi-supervised anomaly detection in machine learning (ML), with important distinctions in experimental design and evaluation specific to the problem of industrial fault detection. We present a novel method of machine fault detection using temporal-difference learning and General Value Functions (GVFs). Using GVFs, we form a predictive model of sensor data to detect faulty behaviour. As sensor data from machines is not i.i.d. but closer to Markovian sampling, temporal-difference learning methods should be well suited for this data. We compare our GVF outlier detection (GVFOD) algorithm to a broad selection of multivariate and temporal outlier detection methods, using datasets collected from a tabletop robot emulating the movement of an industrial actuator. We find that not only does GVFOD achieve the same recall score as other multivariate OD algorithms, it attains significantly higher precision. Furthermore, GVFOD has intuitive hyperparameters which can be selected based upon expert knowledge of the application. Together, these findings allow for a more reliable detection of abnormal machine behaviour to allow ideal timing of maintenance; saving resources, time and cost.

Introduction: Children and adolescents with neurological impairments face reduced participation and independence in daily life activities due to walking difficulties. Existing assistive devices often offer insufficient support, potentially leading to wheelchair dependence and limiting physical activity and daily life engagement. Mobile wearable robots, such as exoskeletons and exosuits, have shown promise in supporting adults during activities of daily living but are underexplored for children.

Methods: We conducted a cross-sectional study to examine the potential of a cable-driven exosuit, the Myosuit, to enhance walking efficiency in adolescents with diverse ambulatory impairments. Each participant walked a course including up-hill, down-hill, level ground walking, and stairs ascending and descending, with and without the exosuit’s assistance. We monitored the time and step count to complete the course and the average heart rate and muscle activity. Additionally, we assessed the adolescents’ perspective on the exosuit’s utility using a visual analog scale.

Results: Six adolescents completed the study. Although not statistically significant, five participants completed the course with the exosuit’s assistance in reduced time (time reduction range: [-3.87, 17.42]%, p-value: 0.08, effect size: 0.88). The number of steps taken decreased significantly with the Myosuit’s assistance (steps reduction range: [1.07, 15.71]%, p-value: 0.04, effect size: 0.90). Heart rate and muscle activity did not differ between Myosuit-assisted and unassisted conditions (p-value: 0.96 and 0.35, effect size: 0.02 and 0.42, respectively). Participants generally perceived reduced effort and increased safety with the Myosuit’s assistance, especially during tasks involving concentric contractions (e.g., walking uphill). Three participants expressed a willingness to use the Myosuit in daily life, while the others found it heavy or too conspicuous.

Discussion: Increased walking speed without increasing physical effort when performing activities of daily living could lead to higher levels of participation and increased functional independence. Despite perceiving the benefits introduced by the exosuit’s assistance, adolescents reported the need for further modification of the device design before using it extensively at home and in the community.

Heterogeneous multi-agent systems can be deployed to complete a variety of tasks, including some that are impossible using a single generic modality. This paper introduces an approach to solving the problem of cooperative behavior planning in small heterogeneous robot teams where members can both function independently as well as physically interact with each other in ways that give rise to additional functionality. This approach enables, for the first time, the cooperative completion of tasks that are infeasible when using any single modality from those agents comprising the team.

Introduction: It is crucial to identify neurodevelopmental disorders in infants early on for timely intervention to improve their long-term outcomes. Combining natural play with quantitative measurements of developmental milestones can be an effective way to swiftly and efficiently detect infants who are at risk of neurodevelopmental delays. Clinical studies have established differences in toy interaction behaviors between full-term infants and pre-term infants who are at risk for cerebral palsy and other developmental disorders.

Methods: The proposed toy aims to improve the quantitative assessment of infant-toy interactions and fully automate the process of detecting those infants at risk of developing motor delays. This paper describes the design and development of a toy that uniquely utilizes a collection of soft lossy force sensors which are developed using optical fibers to gather play interaction data from infants laying supine in a gym. An example interaction database was created by having 15 adults complete a total of 2480 interactions with the toy consisting of 620 touches, 620 punches—“kick substitute,” 620 weak grasps and 620 strong grasps.

Results: The data is analyzed for patterns of interaction with the toy face using a machine learning model developed to classify the four interactions present in the database. Results indicate that the configuration of 6 soft force sensors on the face created unique activation patterns.

Discussion: The machine learning algorithm was able to identify the distinct action types from the data, suggesting the potential usability of the toy. Next steps involve sensorizing the entire toy and testing with infants.

The environmental pollution caused by various sources has escalated the climate crisis making the need to establish reliable, intelligent, and persistent environmental monitoring solutions more crucial than ever. Mobile sensing systems are a popular platform due to their cost-effectiveness and adaptability. However, in practice, operation environments demand highly intelligent and robust systems that can cope with an environment’s changing dynamics. To achieve this reinforcement learning has become a popular tool as it facilitates the training of intelligent and robust sensing agents that can handle unknown and extreme conditions. In this paper, a framework that formulates active sensing as a reinforcement learning problem is proposed. This framework allows unification with multiple essential environmental monitoring tasks and algorithms such as coverage, patrolling, source seeking, exploration and search and rescue. The unified framework represents a step towards bridging the divide between theoretical advancements in reinforcement learning and real-world applications in environmental monitoring. A critical review of the literature in this field is carried out and it is found that despite the potential of reinforcement learning for environmental active sensing applications there is still a lack of practical implementation and most work remains in the simulation phase. It is also noted that despite the consensus that, multi-agent systems are crucial to fully realize the potential of active sensing there is a lack of research in this area.



When IEEE Spectrum first wrote about Covariant in 2020, it was a new-ish robotics startup looking to apply robotics to warehouse picking at scale through the magic of a single end-to-end neural network. At the time, Covariant was focused on this picking use case, because it represents an application that could provide immediate value—warehouse companies pay Covariant for its robots to pick items in their warehouses. But for Covariant, the exciting part was that picking items in warehouses has, over the last four years, yielded a massive amount of real-world manipulation data—and you can probably guess where this is going.

Today, Covariant is announcing RFM-1, which the company describes as a robotics foundation model that gives robots the “human-like ability to reason.” That’s from the press release, and while I wouldn’t necessarily read too much into “human-like” or “reason,” what Covariant has going on here is pretty cool.

“Foundation model” means that RFM-1 can be trained on more data to do more things—at the moment, it’s all about warehouse manipulation because that’s what it’s been trained on, but its capabilities can be expanded by feeding it more data. “Our existing system is already good enough to do very fast, very variable pick and place,” says Covariant co-founder Pieter Abbeel. “But we’re now taking it quite a bit further. Any task, any embodiment—that’s the long-term vision. Robotics foundation models powering billions of robots across the world.” From the sound of things, Covariant’s business of deploying a large fleet of warehouse automation robots was the fastest way for them to collect the tens of millions of trajectories (how a robot moves during a task) that they needed to train the 8 billion parameter RFM-1 model.

Covariant

“The only way you can do what we’re doing is by having robots deployed in the world collecting a ton of data,” says Abbeel. “Which is what allows us to train a robotics foundation model that’s uniquely capable.”

There have been other attempts at this sort of thing: The RTX project is one recent example. But while RT-X depends on research labs sharing what data they have to create a dataset that’s large enough to be useful, Covariant is doing it alone, thanks to its fleet of warehouse robots. “RT-X is about a million trajectories of data,” Abbeel says, “but we’re able to surpass it because we’re getting a million trajectories every few weeks.”

“By building a valuable picking robot that’s deployed across 15 countries with dozens of customers, we essentially have a data collection machine.” —Pieter Abbeel, Covariant

You can think of the current execution of RFM-1 as a prediction engine for suction-based object manipulation in warehouse environments. The model incorporates still images, video, joint angles, force reading, suction cup strength—everything involved in the kind of robotic manipulation that Covariant does. All of these things are interconnected within RFM-1, which means that you can put any of those things into one end of RFM-1, and out of the other end of the model will come a prediction. That prediction can be in the form of an image, a video, or a series of commands for a robot.

What’s important to understand about all of this is that RFM-1 isn’t restricted to picking only things it’s seen before, or only working on robots it has direct experience with. This is what’s nice about foundation models—they can generalize within the domain of their training data, and it’s how Covariant has been able to scale their business as successfully as they have, by not having to retrain for every new picking robot or every new item. What’s counter-intuitive about these large models is that they’re actually better at dealing with new situations than models that are trained specifically for those situations.

For example, let’s say you want to train a model to drive a car on a highway. The question, Abbeel says, is whether it would be worth your time to train on other kinds of driving anyway. The answer is yes, because highway driving is sometimes not highway driving. There will be accidents or rush hour traffic that will require you to drive differently. If you’ve also trained on driving on city streets, you’re effectively training on highway edge cases, which will come in handy at some point and improve performance overall. With RFM-1, it’s the same idea: Training on lots of different kinds of manipulation—different robots, different objects, and so on—means that any single kind of manipulation will be that much more capable.

In the context of generalization, Covariant talks about RFM-1’s ability to “understand” its environment. This can be a tricky word with AI, but what’s relevant is to ground the meaning of “understand” in what RFM-1 is capable of. For example, you don’t need to understand physics to be able to catch a baseball, you just need to have a lot of experience catching baseballs, and that’s where RFM-1 is at. You could also reason out how to catch a baseball with no experience but an understanding of physics, and RFM-1 is not doing this, which is why I hesitate to use the word “understand” in this context.

But this brings us to another interesting capability of RFM-1: it operates as a very effective, if constrained, simulation tool. As a prediction engine that outputs video, you can ask it to generate what the next couple seconds of an action sequence will look like, and it’ll give you a result that’s both realistic and accurate, being grounded in all of its data. The key here is that RFM-1 can effectively simulate objects that are challenging to simulate traditionally, like floppy things.

Covariant’s Abbeel explains that the “world model” that RFM-1 bases its predictions on is effectively a learned physics engine. “Building physics engines turns out to be a very daunting task to really cover every possible thing that can happen in the world,” Abbeel says. “Once you get complicated scenarios, it becomes very inaccurate, very quickly, because people have to make all kinds of approximations to make the physics engine run on a computer. We’re just doing the large-scale data version of this with a world model, and it’s showing really good results.”

Abbeel gives an example of asking a robot to simulate (or predict) what would happen if a cylinder is placed vertically on a conveyor belt. The prediction accurately shows the cylinder falling over and rolling when the belt starts to move—not because the cylinder is being simulated, but because RFM-1 has seen a lot of things being placed on a lot of conveyor belts.

“Five years from now, it’s not unlikely that what we are building here will be the only type of simulator anyone will ever use.” —Pieter Abbeel, Covariant

This only works if there’s the right kind of data for RFM-1 to train on, so unlike most simulation environments, it can’t currently generalize to completely new objects or situations. But Abbeel believes that with enough data, useful world simulation will be possible. “Five years from now, it’s not unlikely that what we are building here will be the only type of simulator anyone will ever use. It’s a more capable simulator than one built from the ground up with collision checking and finite elements and all that stuff. All those things are so hard to build into your physics engine in any kind of way, not to mention the renderer to make things look like they look in the real world—in some sense, we’re taking a shortcut.”

RFM-1 also incorporates language data to be able to communicate more effectively with humans. Covariant

For Covariant to expand the capabilities of RFM-1 towards that long-term vision of foundation models powering “billions of robots across the world,” the next step is to feed it more data from a wider variety of robots doing a wider variety of tasks. “We’ve built essentially a data ingestion engine,” Abbeel says. “If you’re willing to give us data of a different type, we’ll ingest that too.”

“We have a lot of confidence that this kind of model could power all kinds of robots—maybe with more data for the types of robots and types of situations it could be used in.” —Pieter Abbeel, Covariant

One way or another, that path is going to involve a heck of a lot of data, and it’s going to be data that Covariant is not currently collecting with its own fleet of warehouse manipulation robots. So if you’re, say, a humanoid robotics company, what’s your incentive to share all the data you’ve been collecting with Covariant? “The pitch is that we’ll help them get to the real world,” Covariant co-founder Peter Chen says. “I don’t think there are really that many companies that have AI to make their robots truly autonomous in a production environment. If they want AI that’s robust and powerful and can actually help them enter the real world, we are really their best bet.”

Covariant’s core argument here is that while it’s certainly possible for every robotics company to train up their own models individually, the performance—for anybody trying to do manipulation, at least—would be not nearly as good as using a model that incorporates all of the manipulation data that Covariant already has within RFM-1. “It has always been our long term plan to be a robotics foundation model company,” says Chen. “There was just not sufficient data and compute and algorithms to get to this point—but building a universal AI platform for robots, that’s what Covariant has been about from the very beginning.”



When IEEE Spectrum first wrote about Covariant in 2020, it was a new-ish robotics startup looking to apply robotics to warehouse picking at scale through the magic of a single end-to-end neural network. At the time, Covariant was focused on this picking use case, because it represents an application that could provide immediate value—warehouse companies pay Covariant for its robots to pick items in their warehouses. But for Covariant, the exciting part was that picking items in warehouses has, over the last four years, yielded a massive amount of real-world manipulation data—and you can probably guess where this is going.

Today, Covariant is announcing RFM-1, which the company describes as a robotics foundation model that gives robots the “human-like ability to reason.” That’s from the press release, and while I wouldn’t necessarily read too much into “human-like” or “reason,” what Covariant has going on here is pretty cool.

“Foundation model” means that RFM-1 can be trained on more data to do more things—at the moment, it’s all about warehouse manipulation because that’s what it’s been trained on, but its capabilities can be expanded by feeding it more data. “Our existing system is already good enough to do very fast, very variable pick and place,” says Covariant co-founder Pieter Abbeel. “But we’re now taking it quite a bit further. Any task, any embodiment—that’s the long-term vision. Robotics foundation models powering billions of robots across the world.” From the sound of things, Covariant’s business of deploying a large fleet of warehouse automation robots was the fastest way for them to collect the tens of millions of trajectories (how a robot moves during a task) that they needed to train the 8 billion parameter RFM-1 model.

Covariant

“The only way you can do what we’re doing is by having robots deployed in the world collecting a ton of data,” says Abbeel. “Which is what allows us to train a robotics foundation model that’s uniquely capable.”

There have been other attempts at this sort of thing: The RTX project is one recent example. But while RT-X depends on research labs sharing what data they have to create a dataset that’s large enough to be useful, Covariant is doing it alone, thanks to its fleet of warehouse robots. “RT-X is about a million trajectories of data,” Abbeel says, “but we’re able to surpass it because we’re getting a million trajectories every few weeks.”

“By building a valuable picking robot that’s deployed across 15 countries with dozens of customers, we essentially have a data collection machine.” —Pieter Abbeel, Covariant

You can think of the current execution of RFM-1 as a prediction engine for suction-based object manipulation in warehouse environments. The model incorporates still images, video, joint angles, force reading, suction cup strength—everything involved in the kind of robotic manipulation that Covariant does. All of these things are interconnected within RFM-1, which means that you can put any of those things into one end of RFM-1, and out of the other end of the model will come a prediction. That prediction can be in the form of an image, a video, or a series of commands for a robot.

What’s important to understand about all of this is that RFM-1 isn’t restricted to picking only things it’s seen before, or only working on robots it has direct experience with. This is what’s nice about foundation models—they can generalize within the domain of their training data, and it’s how Covariant has been able to scale their business as successfully as they have, by not having to retrain for every new picking robot or every new item. What’s counter-intuitive about these large models is that they’re actually better at dealing with new situations than models that are trained specifically for those situations.

For example, let’s say you want to train a model to drive a car on a highway. The question, Abbeel says, is whether it would be worth your time to train on other kinds of driving anyway. The answer is yes, because highway driving is sometimes not highway driving. There will be accidents or rush hour traffic that will require you to drive differently. If you’ve also trained on driving on city streets, you’re effectively training on highway edge cases, which will come in handy at some point and improve performance overall. With RFM-1, it’s the same idea: Training on lots of different kinds of manipulation—different robots, different objects, and so on—means that any single kind of manipulation will be that much more capable.

In the context of generalization, Covariant talks about RFM-1’s ability to “understand” its environment. This can be a tricky word with AI, but what’s relevant is to ground the meaning of “understand” in what RFM-1 is capable of. For example, you don’t need to understand physics to be able to catch a baseball, you just need to have a lot of experience catching baseballs, and that’s where RFM-1 is at. You could also reason out how to catch a baseball with no experience but an understanding of physics, and RFM-1 is not doing this, which is why I hesitate to use the word “understand” in this context.

But this brings us to another interesting capability of RFM-1: it operates as a very effective, if constrained, simulation tool. As a prediction engine that outputs video, you can ask it to generate what the next couple seconds of an action sequence will look like, and it’ll give you a result that’s both realistic and accurate, being grounded in all of its data. The key here is that RFM-1 can effectively simulate objects that are challenging to simulate traditionally, like floppy things.

Covariant’s Abbeel explains that the “world model” that RFM-1 bases its predictions on is effectively a learned physics engine. “Building physics engines turns out to be a very daunting task to really cover every possible thing that can happen in the world,” Abbeel says. “Once you get complicated scenarios, it becomes very inaccurate, very quickly, because people have to make all kinds of approximations to make the physics engine run on a computer. We’re just doing the large-scale data version of this with a world model, and it’s showing really good results.”

Abbeel gives an example of asking a robot to simulate (or predict) what would happen if a cylinder is placed vertically on a conveyor belt. The prediction accurately shows the cylinder falling over and rolling when the belt starts to move—not because the cylinder is being simulated, but because RFM-1 has seen a lot of things being placed on a lot of conveyor belts.

“Five years from now, it’s not unlikely that what we are building here will be the only type of simulator anyone will ever use.” —Pieter Abbeel, Covariant

This only works if there’s the right kind of data for RFM-1 to train on, so unlike most simulation environments, it can’t currently generalize to completely new objects or situations. But Abbeel believes that with enough data, useful world simulation will be possible. “Five years from now, it’s not unlikely that what we are building here will be the only type of simulator anyone will ever use. It’s a more capable simulator than one built from the ground up with collision checking and finite elements and all that stuff. All those things are so hard to build into your physics engine in any kind of way, not to mention the renderer to make things look like they look in the real world—in some sense, we’re taking a shortcut.”

RFM-1 also incorporates language data to be able to communicate more effectively with humans. Covariant

For Covariant to expand the capabilities of RFM-1 towards that long-term vision of foundation models powering “billions of robots across the world,” the next step is to feed it more data from a wider variety of robots doing a wider variety of tasks. “We’ve built essentially a data ingestion engine,” Abbeel says. “If you’re willing to give us data of a different type, we’ll ingest that too.”

“We have a lot of confidence that this kind of model could power all kinds of robots—maybe with more data for the types of robots and types of situations it could be used in.” —Pieter Abbeel, Covariant

One way or another, that path is going to involve a heck of a lot of data, and it’s going to be data that Covariant is not currently collecting with its own fleet of warehouse manipulation robots. So if you’re, say, a humanoid robotics company, what’s your incentive to share all the data you’ve been collecting with Covariant? “The pitch is that we’ll help them get to the real world,” Covariant co-founder Peter Chen says. “I don’t think there are really that many companies that have AI to make their robots truly autonomous in a production environment. If they want AI that’s robust and powerful and can actually help them enter the real world, we are really their best bet.”

Covariant’s core argument here is that while it’s certainly possible for every robotics company to train up their own models individually, the performance—for anybody trying to do manipulation, at least—would be not nearly as good as using a model that incorporates all of the manipulation data that Covariant already has within RFM-1. “It has always been our long term plan to be a robotics foundation model company,” says Chen. “There was just not sufficient data and compute and algorithms to get to this point—but building a universal AI platform for robots, that’s what Covariant has been about from the very beginning.”

Pages