Feed aggregator



When IEEE Spectrum first wrote about Covariant in 2020, it was a new-ish robotics startup looking to apply robotics to warehouse picking at scale through the magic of a single end-to-end neural network. At the time, Covariant was focused on this picking use case, because it represents an application that could provide immediate value—warehouse companies pay Covariant for its robots to pick items in their warehouses. But for Covariant, the exciting part was that picking items in warehouses has, over the last four years, yielded a massive amount of real-world manipulation data—and you can probably guess where this is going.

Today, Covariant is announcing RFM-1, which the company describes as a robotics foundation model that gives robots the “human-like ability to reason.” That’s from the press release, and while I wouldn’t necessarily read too much into “human-like” or “reason,” what Covariant has going on here is pretty cool.

“Foundation model” means that RFM-1 can be trained on more data to do more things—at the moment, it’s all about warehouse manipulation because that’s what it’s been trained on, but its capabilities can be expanded by feeding it more data. “Our existing system is already good enough to do very fast, very variable pick and place,” says Covariant co-founder Pieter Abbeel. “But we’re now taking it quite a bit further. Any task, any embodiment—that’s the long-term vision. Robotics foundation models powering billions of robots across the world.” From the sound of things, Covariant’s business of deploying a large fleet of warehouse automation robots was the fastest way for them to collect the tens of millions of trajectories (how a robot moves during a task) that they needed to train the 8 billion parameter RFM-1 model.

Covariant

“The only way you can do what we’re doing is by having robots deployed in the world collecting a ton of data,” says Abbeel. “Which is what allows us to train a robotics foundation model that’s uniquely capable.”

There have been other attempts at this sort of thing: The RTX project is one recent example. But while RT-X depends on research labs sharing what data they have to create a dataset that’s large enough to be useful, Covariant is doing it alone, thanks to its fleet of warehouse robots. “RT-X is about a million trajectories of data,” Abbeel says, “but we’re able to surpass it because we’re getting a million trajectories every few weeks.”

“By building a valuable picking robot that’s deployed across 15 countries with dozens of customers, we essentially have a data collection machine.” —Pieter Abbeel, Covariant

You can think of the current execution of RFM-1 as a prediction engine for suction-based object manipulation in warehouse environments. The model incorporates still images, video, joint angles, force reading, suction cup strength—everything involved in the kind of robotic manipulation that Covariant does. All of these things are interconnected within RFM-1, which means that you can put any of those things into one end of RFM-1, and out of the other end of the model will come a prediction. That prediction can be in the form of an image, a video, or a series of commands for a robot.

What’s important to understand about all of this is that RFM-1 isn’t restricted to picking only things it’s seen before, or only working on robots it has direct experience with. This is what’s nice about foundation models—they can generalize within the domain of their training data, and it’s how Covariant has been able to scale their business as successfully as they have, by not having to retrain for every new picking robot or every new item. What’s counter-intuitive about these large models is that they’re actually better at dealing with new situations than models that are trained specifically for those situations.

For example, let’s say you want to train a model to drive a car on a highway. The question, Abbeel says, is whether it would be worth your time to train on other kinds of driving anyway. The answer is yes, because highway driving is sometimes not highway driving. There will be accidents or rush hour traffic that will require you to drive differently. If you’ve also trained on driving on city streets, you’re effectively training on highway edge cases, which will come in handy at some point and improve performance overall. With RFM-1, it’s the same idea: Training on lots of different kinds of manipulation—different robots, different objects, and so on—means that any single kind of manipulation will be that much more capable.

In the context of generalization, Covariant talks about RFM-1’s ability to “understand” its environment. This can be a tricky word with AI, but what’s relevant is to ground the meaning of “understand” in what RFM-1 is capable of. For example, you don’t need to understand physics to be able to catch a baseball, you just need to have a lot of experience catching baseballs, and that’s where RFM-1 is at. You could also reason out how to catch a baseball with no experience but an understanding of physics, and RFM-1 is not doing this, which is why I hesitate to use the word “understand” in this context.

But this brings us to another interesting capability of RFM-1: it operates as a very effective, if constrained, simulation tool. As a prediction engine that outputs video, you can ask it to generate what the next couple seconds of an action sequence will look like, and it’ll give you a result that’s both realistic and accurate, being grounded in all of its data. The key here is that RFM-1 can effectively simulate objects that are challenging to simulate traditionally, like floppy things.

Covariant’s Abbeel explains that the “world model” that RFM-1 bases its predictions on is effectively a learned physics engine. “Building physics engines turns out to be a very daunting task to really cover every possible thing that can happen in the world,” Abbeel says. “Once you get complicated scenarios, it becomes very inaccurate, very quickly, because people have to make all kinds of approximations to make the physics engine run on a computer. We’re just doing the large-scale data version of this with a world model, and it’s showing really good results.”

Abbeel gives an example of asking a robot to simulate (or predict) what would happen if a cylinder is placed vertically on a conveyor belt. The prediction accurately shows the cylinder falling over and rolling when the belt starts to move—not because the cylinder is being simulated, but because RFM-1 has seen a lot of things being placed on a lot of conveyor belts.

“Five years from now, it’s not unlikely that what we are building here will be the only type of simulator anyone will ever use.” —Pieter Abbeel, Covariant

This only works if there’s the right kind of data for RFM-1 to train on, so unlike most simulation environments, it can’t currently generalize to completely new objects or situations. But Abbeel believes that with enough data, useful world simulation will be possible. “Five years from now, it’s not unlikely that what we are building here will be the only type of simulator anyone will ever use. It’s a more capable simulator than one built from the ground up with collision checking and finite elements and all that stuff. All those things are so hard to build into your physics engine in any kind of way, not to mention the renderer to make things look like they look in the real world—in some sense, we’re taking a shortcut.”

RFM-1 also incorporates language data to be able to communicate more effectively with humans. Covariant

For Covariant to expand the capabilities of RFM-1 towards that long-term vision of foundation models powering “billions of robots across the world,” the next step is to feed it more data from a wider variety of robots doing a wider variety of tasks. “We’ve built essentially a data ingestion engine,” Abbeel says. “If you’re willing to give us data of a different type, we’ll ingest that too.”

“We have a lot of confidence that this kind of model could power all kinds of robots—maybe with more data for the types of robots and types of situations it could be used in.” —Pieter Abbeel, Covariant

One way or another, that path is going to involve a heck of a lot of data, and it’s going to be data that Covariant is not currently collecting with its own fleet of warehouse manipulation robots. So if you’re, say, a humanoid robotics company, what’s your incentive to share all the data you’ve been collecting with Covariant? “The pitch is that we’ll help them get to the real world,” Covariant co-founder Peter Chen says. “I don’t think there are really that many companies that have AI to make their robots truly autonomous in a production environment. If they want AI that’s robust and powerful and can actually help them enter the real world, we are really their best bet.”

Covariant’s core argument here is that while it’s certainly possible for every robotics company to train up their own models individually, the performance—for anybody trying to do manipulation, at least—would be not nearly as good as using a model that incorporates all of the manipulation data that Covariant already has within RFM-1. “It has always been our long term plan to be a robotics foundation model company,” says Chen. “There was just not sufficient data and compute and algorithms to get to this point—but building a universal AI platform for robots, that’s what Covariant has been about from the very beginning.”



The global ocean is difficult to explore—the common refrain is that we know less about the deep ocean than we do about the surface of the moon. Australian company Advanced Navigation wants to change that with a pint-sized autonomous underwater vehicle (AUV) that it hopes will become the maritime equivalent of a consumer drone. And the new AUV is already getting to work mapping and monitoring Australia’s coral reefs and diving for shipwrecks.

The Sydney-based company has been developing underwater navigation technology for more than a decade. In 2022, Advanced Navigation unveiled its first in-house AUV, called Hydrus. At less than half a meter long, the vehicle is considerably smaller than most alternatives. Even so, it’s fully autonomous and carries a 4k-resolution camera capable of 60 frames per second that can both capture high-definition video and construct detailed 3D photogrammetry models.

Advanced Navigation says Hydrus—with a depth rating of 3,000 meters, a range of 9 kilometers, and a battery that lasts up to three hours—is capable of a wide variety of missions. The company recently sold two units to the Australian Institute of Marine Science (AIMS), the country’s tropical marine science agency, which will use them to survey coral reefs in the North West Shelf region off the country’s west coast. Hydrus has also recently collaborated with the Western Australian Museum to produce a detailed 3D model of a shipwreck off the coast near Perth.

“If people can go and throw one of these off the boat, just like they can throw a drone up in the air, that will obviously benefit the exploration of the sea.” —Ross Anderson, Western Australian Museum

After many years of supplying components to other robotics companies, Peter Baker, subsea product manager at Advanced Navigation, says they company spotted a gap in the market. “We wanted to take the user experience that someone would have with an aerial drone and bring that underwater,” he says. “It’s very expensive to get images and data of the seabed. So by being able to miniaturize this system, and have it drastically simplified from the user’s point of view, it makes data a lot more accessible to people.”

But building a compact and low-cost AUV is not simple. The deep ocean is not a friendly place for electronics, says Baker, due to a combination of high pressure and corrosive seawater. The traditional way of dealing with this is to stick all the critical components in a sealed titanium tube that can maintain ambient pressure and keep moisture out. However, this requires you to add buoyancy to compensate for the extra weight, says Baker, which increases the bulk of the vehicle. That means bigger motors and bigger batteries. “The whole thing spirals up and up until you’ve got something the size of a minibus,” he says.

Advanced Navigation got around the spiral by designing bespoke pressure-tolerant electronics. They built all of their circuit boards from scratch, carefully selecting components that had been tested to destruction in a hydrostatic pressure chamber. These were then encapsulated in a water-proof composite shell, and to further reduce the risk of water ingress the drone operates completely wirelessly. Batteries are recharged using inductive charging and data transfer is either over Wi-Fi when above water or via an optical modem when below the surface.

Hydrus AUVs are charged using induction to keep corrosive seawater from leaking in through charging ports.Advanced Navigation

This has allowed the company to significantly miniaturize the system, says Baker, which has a drastic impact on the overall cost of operations. “You don’t need a crane or a winch or anything like that to recover the vehicle, you can pick it up with a fishing net,” he says. “You can get away with using a much smaller boat, and the rule of thumb in the industry is if you double the size of your boat, you quadruple the cost.”

Just as important, though, is the vehicle’s ease of use. Most underwater robotics systems still operate with a tether, says Baker, but Hydrus carries all the hardware required to support autonomous navigation on board. The company’s “bread and butter” is inertial navigation technology, which uses accelerometers and gyroscopes to track the vehicle from a known starting point. But the drone also features a sonar system that allows it to stay a set distance from the seabed and also judge its speed by measuring the Doppler shift on echoes as they bounce back.

This means that users can simply program in a set of way points on a map, toss the vehicle overboard and leave it to its own devices, says Baker. The Hydrus does have a low-bandwidth acoustic communication channel that allows the operator to send basic commands like “stop” or “come home,” he says, but Hydrus is designed to be a set-and-forget AUV. “That really lowers the thresholds of what a user needs to be able to operate it,” he says. “If you can fly a DJI drone you could fly a Hydrus.”

The company estimates for a typical seabed investigation in water shallow enough for human divers, the Hydrus could be 75 percent cheaper than alternatives. And the savings would go up significantly at greater depths. What’s more, says Baker, the drone’s precise navigation means it can produce much more consistent and repeatable data.

To demonstrate its capabilities, Hydrus’ designers went hunting for shipwrecks in the Rottnest ships graveyard just off the coast near Perth, in Western Australia. The site was a designated spot for scuttling aging ships, says Ross Anderson, curator at Western Australian Museum, but has yet to be fully explored due to the depth of many of the wrecks.

The Advanced Navigation team used the Hydrus to create a detailed 3D model of a sunken “coal hulk”—one of a category of old iron sailing ships that were later converted to floating coal warehouses for steamships. The Western Australian Museum has been unable to identify the vessel so far, but Anderson says these kind of models can be hugely beneficial for carrying out maritime archaeology research, as well as educating people about what’s below the waves.


Advanced Navigation used its new Hydrus drone to create a detailed 3D image of an unidentified “coal hulk” ship in the Rottnest ships graveyard off the western coast of Australia.

Advanced Navigation

Any technology that can simplify the process is greatly welcomed, Anderson adds. “If people can go and throw one of these off the boat, just like they can throw a drone up in the air, that will obviously benefit the exploration of the sea,” he says.

Ease of use was also a big driver behind AIMS’s purchase of two Hydrus drones, says technology development program lead Melanie Olsen, who is also an IEEE senior member. Most of the technology available for marine science is still research-grade and a long way from a polished, professional product.

“When you’re an operational agency like AIMS, you typically don’t have the luxury of spending a lot of time on the back of the boat getting equipment ready,” says Olsen. “You need something that users can turn on and go and it’s just working, as time is of the essence.”

Another benefit of the Hydrus for AIMS is that the drone can operate at greater depths than divers and in conditions that would be dangerous for humans. “Its enabling our researchers to see further down in the water and also operate in more dangerous situations such as at night, or in the presence of threats such as crocodiles or sharks, places where we just wouldn’t be able to collect that data,” says Olsen.

The agency will initially use the drones to survey reefs on Australia’s North West Shelf, including Scott Reef and Ashmore Reef. The goal is to collect regular data data on coral health to monitor the state of the reefs, investigate how they’re being effected by climate change, and hopefully get early warning of emerging problems. But Olsen says they expect that the Hydrus will become standard part of their ocean monitoring toolkit going forward.

This story was updated on 11 March 2024 to correct the year when Advanced Navigation unveiled Hydrus.



The global ocean is difficult to explore—the common refrain is that we know less about the deep ocean than we do about the surface of the moon. Australian company Advanced Navigation wants to change that with a pint-sized autonomous underwater vehicle (AUV) that it hopes will become the maritime equivalent of a consumer drone. And the new AUV is already getting to work mapping and monitoring Australia’s coral reefs and diving for shipwrecks.

The Sydney-based company has been developing underwater navigation technology for more than a decade. In 2022, Advanced Navigation unveiled its first in-house AUV, called Hydrus. At less than half a meter long, the vehicle is considerably smaller than most alternatives. Even so, it’s fully autonomous and carries a 4k-resolution camera capable of 60 frames per second that can both capture high-definition video and construct detailed 3D photogrammetry models.

Advanced Navigation says Hydrus—with a depth rating of 3,000 meters, a range of 9 kilometers, and a battery that lasts up to three hours—is capable of a wide variety of missions. The company recently sold two units to the Australian Institute of Marine Science (AIMS), the country’s tropical marine science agency, which will use them to survey coral reefs in the North West Shelf region off the country’s west coast. Hydrus has also recently collaborated with the Western Australian Museum to produce a detailed 3D model of a shipwreck off the coast near Perth.

“If people can go and throw one of these off the boat, just like they can throw a drone up in the air, that will obviously benefit the exploration of the sea.” —Ross Anderson, Western Australian Museum

After many years of supplying components to other robotics companies, Peter Baker, subsea product manager at Advanced Navigation, says they company spotted a gap in the market. “We wanted to take the user experience that someone would have with an aerial drone and bring that underwater,” he says. “It’s very expensive to get images and data of the seabed. So by being able to miniaturize this system, and have it drastically simplified from the user’s point of view, it makes data a lot more accessible to people.”

But building a compact and low-cost AUV is not simple. The deep ocean is not a friendly place for electronics, says Baker, due to a combination of high pressure and corrosive seawater. The traditional way of dealing with this is to stick all the critical components in a sealed titanium tube that can maintain ambient pressure and keep moisture out. However, this requires you to add buoyancy to compensate for the extra weight, says Baker, which increases the bulk of the vehicle. That means bigger motors and bigger batteries. “The whole thing spirals up and up until you’ve got something the size of a minibus,” he says.

Advanced Navigation got around the spiral by designing bespoke pressure-tolerant electronics. They built all of their circuit boards from scratch, carefully selecting components that had been tested to destruction in a hydrostatic pressure chamber. These were then encapsulated in a water-proof composite shell, and to further reduce the risk of water ingress the drone operates completely wirelessly. Batteries are recharged using inductive charging and data transfer is either over Wi-Fi when above water or via an optical modem when below the surface.

Hydrus AUVs are charged using induction to keep corrosive seawater from leaking in through charging ports.Advanced Navigation

This has allowed the company to significantly miniaturize the system, says Baker, which has a drastic impact on the overall cost of operations. “You don’t need a crane or a winch or anything like that to recover the vehicle, you can pick it up with a fishing net,” he says. “You can get away with using a much smaller boat, and the rule of thumb in the industry is if you double the size of your boat, you quadruple the cost.”

Just as important, though, is the vehicle’s ease of use. Most underwater robotics systems still operate with a tether, says Baker, but Hydrus carries all the hardware required to support autonomous navigation on board. The company’s “bread and butter” is inertial navigation technology, which uses accelerometers and gyroscopes to track the vehicle from a known starting point. But the drone also features a sonar system that allows it to stay a set distance from the seabed and also judge its speed by measuring the Doppler shift on echoes as they bounce back.

This means that users can simply program in a set of way points on a map, toss the vehicle overboard and leave it to its own devices, says Baker. The Hydrus does have a low-bandwidth acoustic communication channel that allows the operator to send basic commands like “stop” or “come home,” he says, but Hydrus is designed to be a set-and-forget AUV. “That really lowers the thresholds of what a user needs to be able to operate it,” he says. “If you can fly a DJI drone you could fly a Hydrus.”

The company estimates for a typical seabed investigation in water shallow enough for human divers, the Hydrus could be 75 percent cheaper than alternatives. And the savings would go up significantly at greater depths. What’s more, says Baker, the drone’s precise navigation means it can produce much more consistent and repeatable data.

To demonstrate its capabilities, Hydrus’ designers went hunting for shipwrecks in the Rottnest ships graveyard just off the coast near Perth, in Western Australia. The site was a designated spot for scuttling aging ships, says Ross Anderson, curator at Western Australian Museum, but has yet to be fully explored due to the depth of many of the wrecks.

The Advanced Navigation team used the Hydrus to create a detailed 3D model of a sunken “coal hulk”—one of a category of old iron sailing ships that were later converted to floating coal warehouses for steamships. The Western Australian Museum has been unable to identify the vessel so far, but Anderson says these kind of models can be hugely beneficial for carrying out maritime archaeology research, as well as educating people about what’s below the waves.


Advanced Navigation used its new Hydrus drone to create a detailed 3D image of an unidentified “coal hulk” ship in the Rottnest ships graveyard off the western coast of Australia.

Advanced Navigation

Any technology that can simplify the process is greatly welcomed, Anderson adds. “If people can go and throw one of these off the boat, just like they can throw a drone up in the air, that will obviously benefit the exploration of the sea,” he says.

Ease of use was also a big driver behind AIMS’s purchase of two Hydrus drones, says technology development program lead Melanie Olsen, who is also an IEEE senior member. Most of the technology available for marine science is still research-grade and a long way from a polished, professional product.

“When you’re an operational agency like AIMS, you typically don’t have the luxury of spending a lot of time on the back of the boat getting equipment ready,” says Olsen. “You need something that users can turn on and go and it’s just working, as time is of the essence.”

Another benefit of the Hydrus for AIMS is that the drone can operate at greater depths than divers and in conditions that would be dangerous for humans. “Its enabling our researchers to see further down in the water and also operate in more dangerous situations such as at night, or in the presence of threats such as crocodiles or sharks, places where we just wouldn’t be able to collect that data,” says Olsen.

The agency will initially use the drones to survey reefs on Australia’s North West Shelf, including Scott Reef and Ashmore Reef. The goal is to collect regular data data on coral health to monitor the state of the reefs, investigate how they’re being effected by climate change, and hopefully get early warning of emerging problems. But Olsen says they expect that the Hydrus will become standard part of their ocean monitoring toolkit going forward.

This story was updated on 11 March 2024 to correct the year when Advanced Navigation unveiled Hydrus.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLO.Eurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time, whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. We successfully achieve teleoperation of dynamic, whole-body motions in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our knowledge, this is the first demonstration to achieve learning-based, real-time, whole-body humanoid teleoperation.

[ CMU ]

Legged robots have the potential to traverse complex terrain and access confined spaces beyond the reach of traditional platforms thanks to their ability to carefully select footholds and flexibly adapt their body posture while walking. However, robust deployment in real-world applications is still an open challenge. In this paper, we present a method for legged locomotion control using reinforcement learning and 3D volumetric representations to enable robust and versatile locomotion in confined and unstructured environments.

[ Takahiro Miki ]

Sure, 3.3 meters per second is fast for a humanoid, but I’m more impressed by the spinning around while walking downstairs.

[ Unitree ]

Improving the safety of collaborative manipulators necessitates the reduction of inertia in the moving part. We introduce a novel approach in the form of a passive, 3D wire aligner, serving as a lightweight and low-friction power transmission mechanism, thus achieving the desired low inertia in the manipulator’s operation.

[ SAQIEL ]

Thanks, Temma!

Robot Era just launched Humanoid-Gym, an open-source reinforcement learning framework for bipedal humanoids. As you can see from the video, RL algorithms have given the robot, called Xiao Xing, or XBot, the ability to climb up and down haphazardly stacked boxes with relative stability and ease.

[ Robot Era ]

“Impact-Aware Bimanual Catching of Large-Momentum Objects.” Need I say more?

[ SLMC ]

More than 80% of stroke survivors experience walking difficulty, significantly impacting their daily lives, independence, and overall quality of life. Now, new research from the University of Massachusetts Amherst pushes forward the bounds of stroke recovery with a unique robotic hip exoskeleton, designed as a training tool to improve walking function. This invites the possibility of new therapies that are more accessible and easier to translate from practice to daily life, compared to current rehabilitation methods.

[ UMass Amherst ]

Thanks, Julia!

The manipulation here is pretty impressive, but it’s hard to know how impressive without also knowing how much the video was sped up.

[ Somatic ]

DJI drones work to make the world a better place and one of the ways that we do this is through conservation work. We partnered with Halo Robotics and the OFI Orangutan Foundation International to showcase just how these drones can make an impact.

[ DJI ]

The aim of the test is to demonstrate the removal and replacement of satellite modules into a 27U CubeSat format using augmented reality control of a robot. In this use case, the “client” satellite is being upgraded and refueled using modular componentry. The robot will then remove the failed computer module and place it in a fixture. It will then do the same with the propellant tank. The robot will then place these correctly back into the satellite.

[ Extend Robotics ]

This video features some of the highlights and favorite moments from the CYBATHLON Challenges 2024 that took place on 2 February, showing so many diverse types of assistive technology taking on discipline tasks and displaying pilots’ tenacity and determination. The Challenges saw new teams, new tasks, and new formats for many of the CYBATHLON disciplines.

[ Cybathlon ]

It’s been a long road to electrically powered robots.

[ ABB ]

Small drones for catastrophic wildfires (ones covering more than [40,470 hectares]) are like bringing a flashlight to light up a football field. This short video describes the major uses for drones of all sizes and why and when they are used, or why not.

[ CRASAR ]

It probably will not surprise you that there are a lot of robots involved in building Rivian trucks and vans.

[ Kawasaki Robotics ]

DARPA’s Learning Introspective Control (LINC) program is developing machine learning methods that show promise in making that scenario closer to reality. LINC aims to fundamentally improve the safety of mechanical systems—specifically in ground vehicles, ships, drone swarms, and robotics—using various methods that require minimal computing power. The result is an AI-powered controller the size of a cell phone.

[ DARPA ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLO.Eurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time, whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. We successfully achieve teleoperation of dynamic, whole-body motions in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our knowledge, this is the first demonstration to achieve learning-based, real-time, whole-body humanoid teleoperation.

[ CMU ]

Legged robots have the potential to traverse complex terrain and access confined spaces beyond the reach of traditional platforms thanks to their ability to carefully select footholds and flexibly adapt their body posture while walking. However, robust deployment in real-world applications is still an open challenge. In this paper, we present a method for legged locomotion control using reinforcement learning and 3D volumetric representations to enable robust and versatile locomotion in confined and unstructured environments.

[ Takahiro Miki ]

Sure, 3.3 meters per second is fast for a humanoid, but I’m more impressed by the spinning around while walking downstairs.

[ Unitree ]

Improving the safety of collaborative manipulators necessitates the reduction of inertia in the moving part. We introduce a novel approach in the form of a passive, 3D wire aligner, serving as a lightweight and low-friction power transmission mechanism, thus achieving the desired low inertia in the manipulator’s operation.

[ SAQIEL ]

Thanks, Temma!

Robot Era just launched Humanoid-Gym, an open-source reinforcement learning framework for bipedal humanoids. As you can see from the video, RL algorithms have given the robot, called Xiao Xing, or XBot, the ability to climb up and down haphazardly stacked boxes with relative stability and ease.

[ Robot Era ]

“Impact-Aware Bimanual Catching of Large-Momentum Objects.” Need I say more?

[ SLMC ]

More than 80% of stroke survivors experience walking difficulty, significantly impacting their daily lives, independence, and overall quality of life. Now, new research from the University of Massachusetts Amherst pushes forward the bounds of stroke recovery with a unique robotic hip exoskeleton, designed as a training tool to improve walking function. This invites the possibility of new therapies that are more accessible and easier to translate from practice to daily life, compared to current rehabilitation methods.

[ UMass Amherst ]

Thanks, Julia!

The manipulation here is pretty impressive, but it’s hard to know how impressive without also knowing how much the video was sped up.

[ Somatic ]

DJI drones work to make the world a better place and one of the ways that we do this is through conservation work. We partnered with Halo Robotics and the OFI Orangutan Foundation International to showcase just how these drones can make an impact.

[ DJI ]

The aim of the test is to demonstrate the removal and replacement of satellite modules into a 27U CubeSat format using augmented reality control of a robot. In this use case, the “client” satellite is being upgraded and refueled using modular componentry. The robot will then remove the failed computer module and place it in a fixture. It will then do the same with the propellant tank. The robot will then place these correctly back into the satellite.

[ Extend Robotics ]

This video features some of the highlights and favorite moments from the CYBATHLON Challenges 2024 that took place on 2 February, showing so many diverse types of assistive technology taking on discipline tasks and displaying pilots’ tenacity and determination. The Challenges saw new teams, new tasks, and new formats for many of the CYBATHLON disciplines.

[ Cybathlon ]

It’s been a long road to electrically powered robots.

[ ABB ]

Small drones for catastrophic wildfires (ones covering more than [40,470 hectares]) are like bringing a flashlight to light up a football field. This short video describes the major uses for drones of all sizes and why and when they are used, or why not.

[ CRASAR ]

It probably will not surprise you that there are a lot of robots involved in building Rivian trucks and vans.

[ Kawasaki Robotics ]

DARPA’s Learning Introspective Control (LINC) program is developing machine learning methods that show promise in making that scenario closer to reality. LINC aims to fundamentally improve the safety of mechanical systems—specifically in ground vehicles, ships, drone swarms, and robotics—using various methods that require minimal computing power. The result is an AI-powered controller the size of a cell phone.

[ DARPA ]

This survey reviews advances in 3D object detection approaches for autonomous driving. A brief introduction to 2D object detection is first discussed and drawbacks of the existing methodologies are identified for highly dynamic environments. Subsequently, this paper reviews the state-of-the-art 3D object detection techniques that utilizes monocular and stereo vision for reliable detection in urban settings. Based on depth inference basis, learning schemes, and internal representation, this work presents a method taxonomy of three classes: model-based and geometrically constrained approaches, end-to-end learning methodologies, and hybrid methods. There is highlighted segment for current trend of multi-view detectors as end-to-end methods due to their boosted robustness. Detectors from the last two kinds were specially selected to exploit the autonomous driving context in terms of geometry, scene content and instances distribution. To prove the effectiveness of each method, 3D object detection datasets for autonomous vehicles are described with their unique features, e. g., varying weather conditions, multi-modality, multi camera perspective and their respective metrics associated to different difficulty categories. In addition, we included multi-modal visual datasets, i. e., V2X that may tackle the problems of single-view occlusion. Finally, the current research trends in object detection are summarized, followed by a discussion on possible scope for future research in this domain.



You’ve seen this before: a truck unloading robot that’s made up of a mobile base with an arm on it that drives up into the back of a trailer and then uses suction to grab stacked boxes and put them onto a conveyor belt. We’ve written about a couple of the companies doing this, and there are even more out there. It’s easy to understand why—trailer unloading involves a fairly structured and controlled environment with a very repetitive task, it’s a hard job that sucks for humans, and there’s an enormous amount of demand.

While it’s likely true that there’s enough room for a whole bunch of different robotics companies in the trailer unloading space, a given customer is probably only going to pick one, and they’re going to pick the one that offers the right combination of safety, capability, and cost. Anyware Robotics thinks they have that mix, aided by a box handling solution that is both very clever and so obvious that I’m wondering why I didn’t think of it myself.

The overall design of Pixmo itself is fairly standard as far as trailer unloading robots go, but some of the details are interesting. We’re told that Pixmo is the only trailer unloading system that integrates a heavy-payload collaborative arm, actually a fairly new commercial arm from Fanuc. This means that Anyware Robotics doesn’t have to faff about with their own hardware, and also that their robot is arguably safer, being ISO certified safe to work directly with people. The base is custom, but Anyware is contracting it out to a big robotics OEM.

“We’ve put a lot of effort into making sure that most of the components of our robot are off-the-shelf,” co-founder and CEO Thomas Tang tells us. “There are already so many mature and cost-efficient suppliers that we want to offload the supply chain, the certification, the reliability testing onto someone else’s shoulders.” And while there are a selection of automated mobile robots (AMRs) out there that seem like they could get the job done, the problem is that they’re all designed for flat surfaces, and getting into and out of the back of the trailer often involves a short, steep ramp, hence the need for their own design. Even with the custom base, Tang says that Pixmo is very cost efficient, and the company predicts that it will be approximately one third the cost of other solutions with a payback of about 24 months.

But here’s the really clever bit:

Anyware Robotics Pixmo Trailer Unloading

That conveyor system in front of the boxes is an add-on that’s used in support of Pixmo. There are two benefits here: first, having the conveyor add-on aligned with the base of a box minimizes the amount of lifting that Pixmo has to do. This allows Pixmo to handle boxes of up to 65 lbs with a lift-and-slide technique, putting it at the top end of trailer unloading robot payload. And the second benefit is that the add-on system decreases the distance that Pixmo has to move the box to just about as small as it can possibly be, eliminating the need for the arm to rotate around to place a box on a conveyor next to or behind itself. Lowering this cycle time means that Pixmo can achieve a throughput of up to 1,000 boxes per hour—about one box every four seconds, which the Internet suggests is quite fast, even for a professional human. Anyware Robotics is introducing this add-on system at MODEX next week, and they have a patent pending on the idea.

This seems like such a simple, useful idea that I asked Tang why they were the first ones to come up with it. “In robotics startups, there tends to be a legacy mindset issue,” Tang told me. “When people have been working on robot arms for so many years, we just think about how to use robot arms to solve everything. That’s maybe that’s the reason why other companies didn’t come up with this solution.” Tang says that Anyware started with much more complicated add-on designs before finding this solution. “Usually it’s the most simple solution that has the most trial and error behind it.”

Anyware Robotics is focused on trailer unloading for now, but Pixmo could easily be adapted for palletizing and depalletizing or somewhat less easily for other warehouse tasks like order picking or machine tending. But why stop there? A mobile manipulator can (theoretically) do it all (almost), and that’s exactly what Tang wants:

In our long-term vision, we believe that the future will have two different types of general purpose robots. In one direction is the humanoid form, which is a really flexible solution for jobs where you want to replace a human. But there are so many jobs that are just not reasonable for a human body to do. So we believe there should be another form of general purpose robot, which is designed for industrial tasks. Our design philosophy is in that direction—it’s also general purpose, but for industrial applications.

At just over one year old, Anyware has already managed to complete a pilot program (and convert it to a purchase order). They’re currently in the middle of several other pilot programs with leading third-party logistics providers, and they expect to spend the next several months focusing on productization with the goal of releasing the first commercial version of Pixmo by July of this year.



You’ve seen this before: a truck unloading robot that’s made up of a mobile base with an arm on it that drives up into the back of a trailer and then uses suction to grab stacked boxes and put them onto a conveyor belt. We’ve written about a couple of the companies doing this, and there are even more out there. It’s easy to understand why—trailer unloading involves a fairly structured and controlled environment with a very repetitive task, it’s a hard job that sucks for humans, and there’s an enormous amount of demand.

While it’s likely true that there’s enough room for a whole bunch of different robotics companies in the trailer unloading space, a given customer is probably only going to pick one, and they’re going to pick the one that offers the right combination of safety, capability, and cost. Anyware Robotics thinks they have that mix, aided by a box handling solution that is both very clever and so obvious that I’m wondering why I didn’t think of it myself.

The overall design of Pixmo itself is fairly standard as far as trailer unloading robots go, but some of the details are interesting. We’re told that Pixmo is the only trailer unloading system that integrates a heavy-payload collaborative arm, actually a fairly new commercial arm from Fanuc. This means that Anyware Robotics doesn’t have to faff about with their own hardware, and also that their robot is arguably safer, being ISO certified safe to work directly with people. The base is custom, but Anyware is contracting it out to a big robotics OEM.

“We’ve put a lot of effort into making sure that most of the components of our robot are off-the-shelf,” co-founder and CEO Thomas Tang tells us. “There are already so many mature and cost-efficient suppliers that we want to offload the supply chain, the certification, the reliability testing onto someone else’s shoulders.” And while there are a selection of automated mobile robots (AMRs) out there that seem like they could get the job done, the problem is that they’re all designed for flat surfaces, and getting into and out of the back of the trailer often involves a short, steep ramp, hence the need for their own design. Even with the custom base, Tang says that Pixmo is very cost efficient, and the company predicts that it will be approximately one third the cost of other solutions with a payback of about 24 months.

But here’s the really clever bit:

Anyware Robotics Pixmo Trailer Unloading

That conveyor system in front of the boxes is an add-on that’s used in support of Pixmo. There are two benefits here: first, having the conveyor add-on aligned with the base of a box minimizes the amount of lifting that Pixmo has to do. This allows Pixmo to handle boxes of up to 65 lbs with a lift-and-slide technique, putting it at the top end of trailer unloading robot payload. And the second benefit is that the add-on system decreases the distance that Pixmo has to move the box to just about as small as it can possibly be, eliminating the need for the arm to rotate around to place a box on a conveyor next to or behind itself. Lowering this cycle time means that Pixmo can achieve a throughput of up to 1,000 boxes per hour—about one box every four seconds, which the Internet suggests is quite fast, even for a professional human. Anyware Robotics is introducing this add-on system at MODEX next week, and they have a patent pending on the idea.

This seems like such a simple, useful idea that I asked Tang why they were the first ones to come up with it. “In robotics startups, there tends to be a legacy mindset issue,” Tang told me. “When people have been working on robot arms for so many years, we just think about how to use robot arms to solve everything. That’s maybe that’s the reason why other companies didn’t come up with this solution.” Tang says that Anyware started with much more complicated add-on designs before finding this solution. “Usually it’s the most simple solution that has the most trial and error behind it.”

Anyware Robotics is focused on trailer unloading for now, but Pixmo could easily be adapted for palletizing and depalletizing or somewhat less easily for other warehouse tasks like order picking or machine tending. But why stop there? A mobile manipulator can (theoretically) do it all (almost), and that’s exactly what Tang wants:

In our long-term vision, we believe that the future will have two different types of general purpose robots. In one direction is the humanoid form, which is a really flexible solution for jobs where you want to replace a human. But there are so many jobs that are just not reasonable for a human body to do. So we believe there should be another form of general purpose robot, which is designed for industrial tasks. Our design philosophy is in that direction—it’s also general purpose, but for industrial applications.

At just over one year old, Anyware has already managed to complete a pilot program (and convert it to a purchase order). They’re currently in the middle of several other pilot programs with leading third-party logistics providers, and they expect to spend the next several months focusing on productization with the goal of releasing the first commercial version of Pixmo by July of this year.

Cobots are robots that are built for human-robot collaboration (HRC) in a shared environment. In the aftermath of disasters, cobots can cooperate with humans to mitigate risks and increase the possibility of rescuing people in distress. This study examines the resilient and dynamic synergy between a swarm of snake robots, first responders and people to be rescued. The possibility of delivering first aid to potential victims dispersed around a disaster environment is implemented. In the HRC simulation framework presented in this study, the first responder initially deploys a UAV, swarm of snake robots and emergency items. The UAV provides the first responder with the site planimetry, which includes the layout of the area, as well as the precise locations of the individuals in need of rescue and the aiding goods to be delivered. Each individual snake robot in the swarm is then assigned a victim. Subsequently an optimal path is determined by each snake robot using the A* algorithm, to approach and reach its respective target while avoiding obstacles. By using their prehensile capabilities, each snake robot adeptly grasps the aiding object to be dispatched. The snake robots successively arrive at the delivering location near the victim, following their optimal paths, and proceed to release the items. To demonstrate the potential of the framework, several case studies are outlined concerning the execution of operations that combine locomotion, obstacle avoidance, grasping and deploying. The Coppelia-Sim Robotic Simulator is utilised for this framework. The analysis of the motion of the snake robots on the path show highly accurate movement with and without the emergency item. This study is a step towards a holistic semi-autonomous search and rescue operation.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLORADO, USAEurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

Figure has raised a US $675 million Series B, valuing the company at $2.6 billion.

[ Figure ]

Meanwhile, here’s how things are going at Agility Robotics, whose last raise was a $150 million Series B in April of 2022.

[ Agility Robotics ]

Also meanwhile, here’s how things are going at Sanctuary AI, whose last raise was a $58.5 million Series A in March of 2022.

[ Sanctuary AI ]

The time has come for humanoid robots to enter industrial production lines and learn how to assist humans by undertaking repetitive, tedious, and potentially dangerous tasks for them. Recently, UBTECH’s humanoid robot Walker S was introduced into the assembly line of NIO’s advanced vehicle-manufacturing center, as an “intern” assisting in the car production. Walker S is the first bipedal humanoid robot to complete a specific workstation’s tasks on a mobile EV production line.

[ UBTECH ]

Henry Evans keeps working hard to make robots better, this time with the assistance of researchers from Carnegie Mellon University.

Henry said he preferred using head-worn assistive teleoperation (HAT) with a robot for certain tasks rather than depending on a caregiver. “Definitely scratching itches,” he said. “I would be happy to have it stand next to me all day, ready to do that or hold a towel to my mouth. Also, feeding me soft foods, operating the blinds, and doing odd jobs around the room.”
One innovation in particular, software called Driver Assistance that helps align the robot’s gripper with an object the user wants to pick up, was “awesome,” Henry said. Driver Assistance leaves the user in control while it makes the fine adjustments and corrections that can make controlling a robot both tedious and demanding. “That’s better than anything I have tried for grasping,” Henry said, adding that he would like to see Driver Assistance used for every interface that controls Stretch robots.

[ HAT2 ] via [ CMU ]

Watch this video for the three glorious seconds at the end.

[ Tech United ]

Get ready to rip, shear, mow, and tear, as DOOM is back! This April, we’re making the legendary game playable on our robotic mowers as a tribute to 30 years of mowing down demons.

Oh, it’s HOOSKvarna, not HUSKvarna.

[ Husqvarna ] via [ Engadget ]

Latest developments demonstrated on the Ameca Desktop platform. Having fun with vision- and voice-cloning capabilities.

[ Engineered Arts ]

Could an artificial-intelligence system learn language from a child? New York University researchers supported by the National Science Foundation, using first-person video from a head-mounted camera, trained AI models to learn language through the eyes and ears of a child.

[ NYU ]

The world’s leaders in manufacturing, natural resources, power, and utilities are using our autonomous robots to gather data of higher quality and higher quantities of data than ever before. Thousands of Spots have been deployed around the world—more than any other walking robot—to tackle this challenge. This release helps maintenance teams tap into the power of AI with new software capabilities and Spot enhancements.

[ Boston Dynamics ]

Modular self-reconfigurable robotic systems are more adaptive than conventional systems. This article proposes a novel free-form and truss-structured modular self-reconfigurable robot called FreeSN, containing node and strut modules. This article presents a novel configuration identification system for FreeSN, including connection point magnetic localization, module identification, module orientation fusion, and system-configuration fusion.

[ Freeform Robotics ]

The OOS-SIM (On-Orbit Servicing Simulator) is a simulator for on-orbit servicing tasks such as repair, maintenance and assembly that have to be carried out on satellites orbiting the earth. It simulates the operational conditions in orbit, such as the felt weightlessness and the harsh illumination.

[ DLR ]

The next CYBATHLON competition, which will take place again in 2024, breaks down barriers between the public, people with disabilities, researchers and technology developers. From 25 to 27 October 2024, the CYBATHLON will take place in a global format in the Arena Schluefweg in Kloten near Zurich and in local hubs all around the world.

[ CYBATHLON ]

George’s story is a testament to the incredible journey that unfolds when passion, opportunity and community converge. His journey from a drone enthusiast to someone actively contributing to making a difference not only to his local community but also globally; serves as a beacon of hope for all who dare to dream and pursue their passions.

[ WeRobotics ]

In case you’d forgotten, Amazon has a lot of robots.

[ Amazon Robotics ]

ABB’s fifty-year story of robotic innovation that began in 1974 with the sale of the world’s first commercial all-electric robot, the IRB 6. Björn Weichbrodt was a key figure in the development of the IRB 6.

[ ABB ]

Robotics Debate of the Ingenuity Labs Robotics and AI Symposium (RAIS2023) from October 12, 2023: Is robotics helping or hindering our progress on UN Sustainable Development Goals?

[ Ingenuity Labs ]



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLORADO, USAEurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

Figure has raised a US $675 million Series B, valuing the company at $2.6 billion.

[ Figure ]

Meanwhile, here’s how things are going at Agility Robotics, whose last raise was a $150 million Series B in April of 2022.

[ Agility Robotics ]

Also meanwhile, here’s how things are going at Sanctuary AI, whose last raise was a $58.5 million Series A in March of 2022.

[ Sanctuary AI ]

The time has come for humanoid robots to enter industrial production lines and learn how to assist humans by undertaking repetitive, tedious, and potentially dangerous tasks for them. Recently, UBTECH’s humanoid robot Walker S was introduced into the assembly line of NIO’s advanced vehicle-manufacturing center, as an “intern” assisting in the car production. Walker S is the first bipedal humanoid robot to complete a specific workstation’s tasks on a mobile EV production line.

[ UBTECH ]

Henry Evans keeps working hard to make robots better, this time with the assistance of researchers from Carnegie Mellon University.

Henry said he preferred using head-worn assistive teleoperation (HAT) with a robot for certain tasks rather than depending on a caregiver. “Definitely scratching itches,” he said. “I would be happy to have it stand next to me all day, ready to do that or hold a towel to my mouth. Also, feeding me soft foods, operating the blinds, and doing odd jobs around the room.”
One innovation in particular, software called Driver Assistance that helps align the robot’s gripper with an object the user wants to pick up, was “awesome,” Henry said. Driver Assistance leaves the user in control while it makes the fine adjustments and corrections that can make controlling a robot both tedious and demanding. “That’s better than anything I have tried for grasping,” Henry said, adding that he would like to see Driver Assistance used for every interface that controls Stretch robots.

[ HAT2 ] via [ CMU ]

Watch this video for the three glorious seconds at the end.

[ Tech United ]

Get ready to rip, shear, mow, and tear, as DOOM is back! This April, we’re making the legendary game playable on our robotic mowers as a tribute to 30 years of mowing down demons.

Oh, it’s HOOSKvarna, not HUSKvarna.

[ Husqvarna ] via [ Engadget ]

Latest developments demonstrated on the Ameca Desktop platform. Having fun with vision- and voice-cloning capabilities.

[ Engineered Arts ]

Could an artificial-intelligence system learn language from a child? New York University researchers supported by the National Science Foundation, using first-person video from a head-mounted camera, trained AI models to learn language through the eyes and ears of a child.

[ NYU ]

The world’s leaders in manufacturing, natural resources, power, and utilities are using our autonomous robots to gather data of higher quality and higher quantities of data than ever before. Thousands of Spots have been deployed around the world—more than any other walking robot—to tackle this challenge. This release helps maintenance teams tap into the power of AI with new software capabilities and Spot enhancements.

[ Boston Dynamics ]

Modular self-reconfigurable robotic systems are more adaptive than conventional systems. This article proposes a novel free-form and truss-structured modular self-reconfigurable robot called FreeSN, containing node and strut modules. This article presents a novel configuration identification system for FreeSN, including connection point magnetic localization, module identification, module orientation fusion, and system-configuration fusion.

[ Freeform Robotics ]

The OOS-SIM (On-Orbit Servicing Simulator) is a simulator for on-orbit servicing tasks such as repair, maintenance and assembly that have to be carried out on satellites orbiting the earth. It simulates the operational conditions in orbit, such as the felt weightlessness and the harsh illumination.

[ DLR ]

The next CYBATHLON competition, which will take place again in 2024, breaks down barriers between the public, people with disabilities, researchers and technology developers. From 25 to 27 October 2024, the CYBATHLON will take place in a global format in the Arena Schluefweg in Kloten near Zurich and in local hubs all around the world.

[ CYBATHLON ]

George’s story is a testament to the incredible journey that unfolds when passion, opportunity and community converge. His journey from a drone enthusiast to someone actively contributing to making a difference not only to his local community but also globally; serves as a beacon of hope for all who dare to dream and pursue their passions.

[ WeRobotics ]

In case you’d forgotten, Amazon has a lot of robots.

[ Amazon Robotics ]

ABB’s fifty-year story of robotic innovation that began in 1974 with the sale of the world’s first commercial all-electric robot, the IRB 6. Björn Weichbrodt was a key figure in the development of the IRB 6.

[ ABB ]

Robotics Debate of the Ingenuity Labs Robotics and AI Symposium (RAIS2023) from October 12, 2023: Is robotics helping or hindering our progress on UN Sustainable Development Goals?

[ Ingenuity Labs ]



Today, Figure is announcing an astonishing US $675 million Series B raise, which values the company at an even more astonishing $2.6 billion. Figure is one of the companies working towards a multi or general purpose (depending on who you ask) bipedal or humanoid (depending on who you ask) robot. The astonishing thing about this valuation is that Figure’s robot is still very much in the development phase—although they’re making rapid progress, which they demonstrate in a new video posted this week.

This round of funding comes from Microsoft, OpenAI Startup Fund, Nvidia, Jeff Bezos (through Bezos Expeditions), Parkway Venture Capital, Intel Capital, Align Ventures, and ARK Invest. Figure says that they’re going to use this new capital “for scaling up AI training, robot manufacturing, expanding engineering headcount, and advancing commercial deployment efforts.” In addition, Figure and OpenAI will be collaborating on the development of “next generation AI models for humanoid robots” which will “help accelerate Figure’s commercial timeline by enhancing the capabilities of humanoid robots to process and reason from language.”

As far as that commercial timeline goes, here’s the most recent update:

Figure

And to understand everything that’s going on here, we sent a whole bunch of questions to Jenna Reher, Senior Robotics/AI Engineer at Figure.

What does “fully autonomous” mean, exactly?

Jenna Reher: In this case, we simply put the robot on the ground and hit go on the task with no other user input. What you see is using a learned vision model for bin detection that allows us to localize the robot relative to the target bin and get the bin pose. The robot can then navigate itself to within reach of the bin, determine grasp points based on the bin pose, and detect grasp success through the measured forces on the hands. Once the robot turns and sees the conveyor the rest of the task rolls out in a similar manner. By doing things in this way we can move the bins and conveyor around in the test space or start the robot from a different position and still complete the task successfully.

How many takes did it take to get this take?

Reher: We’ve been running this use case consistently for some time now as part of our work in the lab, so we didn’t really have to change much for the filming here. We did two or three practice runs in the morning and then three filming takes. All of the takes were successful, so the extras were to make sure we got the cleanest one to show.

What’s back in the Advanced Actuator Lab?

Reher: We have an awesome team of folks working on some exciting custom actuator designs for our future robots, as well as supporting and characterizing the actuators that went into our current robots.

That’s a very specific number for “speed vs human.” Which human did you measure the robot’s speed against?

Reher: We timed Brett [Adcock, founder of Figure] and a few poor engineers doing the task and took the average to get a rough baseline. If you are observant, that seemingly over-specific number is just saying we’re at 1/6 human speed. The main point that we’re trying to make here is that we are aware we are currently below human speed, and it’s an important metric to track as we improve.

What’s the tether for?

Reher: For this task we currently process the camera data off-robot while all of the behavior planning and control happens onboard in the computer that’s in the torso. Our robots should be fully tetherless in the near future as we finish packaging all of that onboard. We’ve been developing behaviors quickly in the lab here at Figure in parallel to all of the other systems engineering and integration efforts happening, so hopefully folks notice all of these subtle parallel threads converging as we try to release regular updates.

How the heck do you keep your robotics lab so clean?

Reher: Everything we’ve filmed so far is in our large robot test lab, so it’s a lot easier to keep the area clean when people’s desks aren’t intruding in the space. Definitely no guarantees on that level of cleanliness if the camera were pointed in the other direction!

Is the robot in the background doing okay?

Reher: Yes! The other robot was patiently standing there in the background, waiting for the filming to finish up so that our manipulation team could get back to training it to do more manipulation tasks. We hope we can share some more developments with that robot as the main star in the near future.

What would happen if I put a single bowling ball into that tote?

Reher: A bowling ball is particularly menacing to this task primarily due to the moving mass, in addition to the impact if you are throwing it in. The robot would in all likelihood end up dropping the tote, stay standing, and abort the task. With what you see here, we assume that the mass of the tote is known a-priori so that our whole body controller can compensate for the external forces while tracking the manipulation task. Reacting to and estimating larger unknown disturbances such as this is a challenging problem, but we’re definitely working on it.

Tell me more about that very zen arm and hand pose that the robot adopts after putting the tote on the conveyor.

Reher: It does look kind of zen! If you re-watch our coffee video you’ll notice the same pose after the robot gets things brewing. This is a reset pose that our controller will go into between manipulation tasks while the robot is awaiting commands to execute either an engineered behavior or a learned policy.

Are the fingers less fragile than they look?

Reher: They are more robust than they look, but not impervious to damage by any means. The design is pretty modular which is great, meaning that if we damage one or two fingers there is a small number of parts to swap to get everything back up and running. The current fingers won’t necessarily survive a direct impact from a bad fall, but can pick up totes and do manipulation tasks all day without issues.

Is the Figure logo footsteps?

Reher: One of the reasons I really like the figure logo is that it has a bunch of different interpretations depending on how you look at it. In some cases it’s just an F that looks like a footstep plan rollout, while some of the logo animations we have look like active stepping. One other possible interpretation could be an occupancy grid.


Today, Figure is announcing an astonishing US $675 million Series B raise, which values the company at an even more astonishing $2.6 billion. Figure is one of the companies working towards a multi or general purpose (depending on who you ask) bipedal or humanoid (depending on who you ask) robot. The astonishing thing about this valuation is that Figure’s robot is still very much in the development phase—although they’re making rapid progress, which they demonstrate in a new video posted this week.

This round of funding comes from Microsoft, OpenAI Startup Fund, Nvidia, Jeff Bezos (through Bezos Expeditions), Parkway Venture Capital, Intel Capital, Align Ventures, and ARK Invest. Figure says that they’re going to use this new capital “for scaling up AI training, robot manufacturing, expanding engineering headcount, and advancing commercial deployment efforts.” In addition, Figure and OpenAI will be collaborating on the development of “next generation AI models for humanoid robots” which will “help accelerate Figure’s commercial timeline by enhancing the capabilities of humanoid robots to process and reason from language.”

As far as that commercial timeline goes, here’s the most recent update:

Figure

And to understand everything that’s going on here, we sent a whole bunch of questions to Jenna Reher, Senior Robotics/AI Engineer at Figure.

What does “fully autonomous” mean, exactly?

Jenna Reher: In this case, we simply put the robot on the ground and hit go on the task with no other user input. What you see is using a learned vision model for bin detection that allows us to localize the robot relative to the target bin and get the bin pose. The robot can then navigate itself to within reach of the bin, determine grasp points based on the bin pose, and detect grasp success through the measured forces on the hands. Once the robot turns and sees the conveyor the rest of the task rolls out in a similar manner. By doing things in this way we can move the bins and conveyor around in the test space or start the robot from a different position and still complete the task successfully.

How many takes did it take to get this take?

Reher: We’ve been running this use case consistently for some time now as part of our work in the lab, so we didn’t really have to change much for the filming here. We did two or three practice runs in the morning and then three filming takes. All of the takes were successful, so the extras were to make sure we got the cleanest one to show.

What’s back in the Advanced Actuator Lab?

Reher: We have an awesome team of folks working on some exciting custom actuator designs for our future robots, as well as supporting and characterizing the actuators that went into our current robots.

That’s a very specific number for “speed vs human.” Which human did you measure the robot’s speed against?

Reher: We timed Brett [Adcock, founder of Figure] and a few poor engineers doing the task and took the average to get a rough baseline. If you are observant, that seemingly over-specific number is just saying we’re at 1/6 human speed. The main point that we’re trying to make here is that we are aware we are currently below human speed, and it’s an important metric to track as we improve.

What’s the tether for?

Reher: For this task we currently process the camera data off-robot while all of the behavior planning and control happens onboard in the computer that’s in the torso. Our robots should be fully tetherless in the near future as we finish packaging all of that onboard. We’ve been developing behaviors quickly in the lab here at Figure in parallel to all of the other systems engineering and integration efforts happening, so hopefully folks notice all of these subtle parallel threads converging as we try to release regular updates.

How the heck do you keep your robotics lab so clean?

Reher: Everything we’ve filmed so far is in our large robot test lab, so it’s a lot easier to keep the area clean when people’s desks aren’t intruding in the space. Definitely no guarantees on that level of cleanliness if the camera were pointed in the other direction!

Is the robot in the background doing okay?

Reher: Yes! The other robot was patiently standing there in the background, waiting for the filming to finish up so that our manipulation team could get back to training it to do more manipulation tasks. We hope we can share some more developments with that robot as the main star in the near future.

What would happen if I put a single bowling ball into that tote?

Reher: A bowling ball is particularly menacing to this task primarily due to the moving mass, in addition to the impact if you are throwing it in. The robot would in all likelihood end up dropping the tote, stay standing, and abort the task. With what you see here, we assume that the mass of the tote is known a-priori so that our whole body controller can compensate for the external forces while tracking the manipulation task. Reacting to and estimating larger unknown disturbances such as this is a challenging problem, but we’re definitely working on it.

Tell me more about that very zen arm and hand pose that the robot adopts after putting the tote on the conveyor.

Reher: It does look kind of zen! If you re-watch our coffee video you’ll notice the same pose after the robot gets things brewing. This is a reset pose that our controller will go into between manipulation tasks while the robot is awaiting commands to execute either an engineered behavior or a learned policy.

Are the fingers less fragile than they look?

Reher: They are more robust than they look, but not impervious to damage by any means. The design is pretty modular which is great, meaning that if we damage one or two fingers there is a small number of parts to swap to get everything back up and running. The current fingers won’t necessarily survive a direct impact from a bad fall, but can pick up totes and do manipulation tasks all day without issues.

Is the Figure logo footsteps?

Reher: One of the reasons I really like the figure logo is that it has a bunch of different interpretations depending on how you look at it. In some cases it’s just an F that looks like a footstep plan rollout, while some of the logo animations we have look like active stepping. One other possible interpretation could be an occupancy grid.

One of the big challenges in robotics is the generalization necessary for performing unknown tasks in unknown environments on unknown objects. For us humans, this challenge is simplified by the commonsense knowledge we can access. For cognitive robotics, representing and acquiring commonsense knowledge is a relevant problem, so we perform a systematic literature review to investigate the current state of commonsense knowledge exploitation in cognitive robotics. For this review, we combine a keyword search on six search engines with a snowballing search on six related reviews, resulting in 2,048 distinct publications. After applying pre-defined inclusion and exclusion criteria, we analyse the remaining 52 publications. Our focus lies on the use cases and domains for which commonsense knowledge is employed, the commonsense aspects that are considered, the datasets/resources used as sources for commonsense knowledge and the methods for evaluating these approaches. Additionally, we discovered a divide in terminology between research from the knowledge representation and reasoning and the cognitive robotics community. This divide is investigated by looking at the extensive review performed by Zech et al. (The International Journal of Robotics Research, 2019, 38, 518–562), with whom we have no overlapping publications despite the similar goals.

We explore an alternative approach to the design of robots that deviates from the common envisionment of having one unified agent. What if robots are depicted as an agentic ensemble where agency is distributed over different components? In the project presented here, we investigate the potential contributions of this approach to creating entertaining and joyful human-robot interaction (HRI), which also remains comprehensible to human observers. We built a service robot—which takes care of plants as a Plant-Watering Robot (PWR)—that appears as a small ship controlled by a robotic captain accompanied by kinetic elements. The goal of this narrative design, which utilizes a distributed agency approach, is to make the robot entertaining to watch and foster its acceptance. We discuss the robot’s design rationale and present observations from an exploratory study in two contrastive settings, on a university campus and in a care home for people with dementia, using a qualitative video-based approach for analysis. Our observations indicate that such a design has potential regarding the attraction, acceptance, and joyfulness it can evoke. We discuss aspects of this design approach regarding the field of elderly care, limitations of our study, and identify potential fields of use and further scopes for studies.

Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities.

Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components.

Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%.

Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

The robotics discipline is exploring precise and versatile solutions for upper-limb rehabilitation in Multiple Sclerosis (MS). People with MS can greatly benefit from robotic systems to help combat the complexities of this disease, which can impair the ability to perform activities of daily living (ADLs). In order to present the potential and the limitations of smart mechatronic devices in the mentioned clinical domain, this review is structured to propose a concise SWOT (Strengths, Weaknesses, Opportunities, and Threats) Analysis of robotic rehabilitation in MS. Through the SWOT Analysis, a method mostly adopted in business management, this paper addresses both internal and external factors that can promote or hinder the adoption of upper-limb rehabilitation robots in MS. Subsequently, it discusses how the synergy with another category of interaction technologies - the systems underlying virtual and augmented environments - may empower Strengths, overcome Weaknesses, expand Opportunities, and handle Threats in rehabilitation robotics for MS. The impactful adaptability of these digital settings (extensively used in rehabilitation for MS, even to approach ADL-like tasks in safe simulated contexts) is the main reason for presenting this approach to face the critical issues of the aforementioned SWOT Analysis. This methodological proposal aims at paving the way for devising further synergistic strategies based on the integration of medical robotic devices with other promising technologies to help upper-limb functional recovery in MS.

Creating an accurate model of a user’s skills is an essential task for Intelligent Tutoring Systems (ITS) and robotic tutoring systems. This allows the system to provide personalized help based on the user’s knowledge state. Most user skill modeling systems have focused on simpler tasks such as arithmetic or multiple-choice questions, where the user’s model is only updated upon task completion. These tasks have a single correct answer and they generate an unambiguous observation of the user’s answer. This is not the case for more complex tasks such as programming or engineering tasks, where the user completing the task creates a succession of noisy user observations as they work on different parts of the task. We create an algorithm called Time-Dependant Bayesian Knowledge Tracing (TD-BKT) that tracks users’ skills throughout these more complex tasks. We show in simulation that it has a more accurate model of the user’s skills and, therefore, can select better teaching actions than previous algorithms. Lastly, we show that a robot can use TD-BKT to model a user and teach electronic circuit tasks to participants during a user study. Our results show that participants significantly improved their skills when modeled using TD-BKT.



Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.

HRI 2024: 11–15 March 2024, BOULDER, COLO.Eurobot Open 2024: 8–11 May 2024, LA ROCHE-SUR-YON, FRANCEICRA 2024: 13–17 May 2024, YOKOHAMA, JAPANRoboCup 2024: 17–22 July 2024, EINDHOVEN, NETHERLANDS

Enjoy today’s videos!

Legged robots have the potential to become vital in maintenance, home support, and exploration scenarios. In order to interact with and manipulate their environments, most legged robots are equipped with a dedicated robot arm, which means additional mass and mechanical complexity compared to standard legged robots. In this work, we explore pedipulation—using the legs of a legged robot for manipulation.

This work, by Philip Arm, Mayank Mittal, Hendrik Kolvenbach, and Marco Hutter from ETHZ RSL, will be presented at the IEEE International Conference on Robotics and Automation (ICRA 2024) in May in Japan (see events calendar above).

[ Pedipulate ]

I learned a new word today! “Stigmergy.” Stigmergy is a kind of group coordination that’s based on environmental modification. Like, when insects leave pheromone trails, they’re not directly sending messages to other individuals, but as a group the ants are able to manifest surprisingly complex coordinated behaviors. Cool, right? Researchers are IRIDIA are exploring the possibilities for robots using stigmergy with a cool ‘artificial pheromone’ system using a UV-sensitive surface.

“Automatic design of stigmergy-based behaviors for robot swarms,” by Muhammad Salman, David Garzón Ramos, and Mauro Birattari, is published in the journal Communications Engineering.

[ Nature ] via [ IRIDIA ]

Thanks, David!

Filmed in July 2017, this video shows Atlas walking through a “hatch” on a pitching surface. This uses autonomous behaviors, with the robot not knowing about the rocking world. Robot built by Boston Dynamics for the DARPA Robotics Challenge in 2013. Software by IHMC Robotics.

[ IHMC ]

That IHMC video reminded me of the SAFFiR program for Shipboard Autonomous Firefighting Robots, which is responsible for a bunch of really cool research in partnership with the United States Naval Research Laboratory. NRL did some interesting stuff with Nexi robots from MIT and made their own videos. That effort I think didn’t get nearly enough credit for being very entertaining while communicating important robotics research.

[ NRL ]

I want more robot videos with this energy.

[ MIT CSAIL ]

Large industrial asset operators increasingly use robotics to automate hazardous work at their facilities. This has led to soaring demand for autonomous inspection solutions like ANYmal. Series production by our partner Zollner enables ANYbotics to supply our customers with the required quantities of robots.

[ ANYbotics ]

This week is Grain Bin Safety Week, and Grain Weevil is here to help.

[ Grain Weevil ]

Oof, this is some heavy, heavy deep-time stuff.

[ Onkalo ]

And now, this.

[ RozenZebet ]

Hawkeye is a real time multimodal conversation and interaction agent for the Boston Dynamics’ mobile robot Spot. Leveraging OpenAI’s experimental GPT-4 Turbo and Vision AI models, Hawkeye aims to empower everyone, from seniors to healthcare professionals in forming new and unique interactions with the world around them.

That moment at 1:07 is so relatable.

[ Hawkeye ]

Wing would really prefer that if you find one of their drones on the ground, you don’t run off with it.

[ Wing ]

The rover Artemis, developed at the DFKI Robotics Innovation Center, has been equipped with a penetrometer that measures the soil’s penetration resistance to obtain precise information about soil strength. The video showcases an initial test run with the device mounted on the robot. During this test, the robot was remotely controlled, and the maximum penetration depth was limited to 15 mm.

[ DFKI ]

To efficiently achieve complex humanoid loco-manipulation tasks in industrial contexts, we propose a combined vision-based tracker-localization interplay integrated as part of a task-space whole-body optimization control. Our approach allows humanoid robots, targeted for industrial manufacturing, to manipulate and assemble large-scale objects while walking.

[ Paper ]

We developed a novel multi-body robot (called the Two-Body Bot) consisting of two small-footprint mobile bases connected by a four bar linkage where handlebars are mounted. Each base measures only 29.2 cm wide, making the robot likely the slimmest ever developed for mobile postural assistance.

[ MIT ]

Lex Fridman interviews Marc Raibert.

[ Lex Fridman ]

Pages