Smart haptic gloves are a new technology emerging in Virtual Reality (VR) with a promise to enhance sensory feedback in VR. This paper presents one of the first attempts to explore its application to surgical training for neurosurgery trainees using VR-based surgery simulators. We develop and evaluate a surgical simulator for External Ventricular Drain Placement (EVD), a common procedure in the field of neurosurgery. Haptic gloves are used in combination with a VR environment to augment the experience of burr hole placement, and flexible catheter manipulation. The simulator was integrated into the training curriculum at the 2022 Canadian Neurosurgery Rookie Bootcamp. Thirty neurosurgery residents used the simulator where objective performance metrics and subjective experience scores were acquired. We provide the details of the simulator development, as well as the user study results and draw conclusions on the benefits added by the haptic gloves and future directions.
Feed aggregator
Introduction: Image-based heart rate estimation technology offers a contactless approach to healthcare monitoring that could improve the lives of millions of people. In order to comprehensively test or optimize image-based heart rate extraction methods, the dataset should contain a large number of factors such as body motion, lighting conditions, and physiological states. However, collecting high-quality datasets with complete parameters is a huge challenge.
Methods: In this paper, we introduce a bionic human model based on a three-dimensional (3D) representation of the human body. By integrating synthetic cardiac signal and body involuntary motion into the 3D model, five well-known traditional and four deep learning iPPG (imaging photoplethysmography) extraction methods are used to test the rendered videos.
Results: To compare with different situations in the real world, four common scenarios (stillness, expression/talking, light source changes, and physical activity) are created on each 3D human. The 3D human can be built with any appearance and different skin tones. A high degree of agreement is achieved between the signals extracted from videos with the synthetic human and videos with a real human-the performance advantages and disadvantages of the selected iPPG methods are consistent for both real and 3D humans.
Discussion: This technology has the capability to generate synthetic humans within various scenarios, utilizing precisely controlled parameters and disturbances. Furthermore, it holds considerable potential for testing and optimizing image-based vital signs methods in challenging situations where real people with reliable ground truth measurements are difficult to obtain, such as in drone rescue.
The generative AI revolution embodied in tools like ChatGPT, Midjourney, and many others is at its core based on a simple formula: Take a very large neural network, train it on a huge dataset scraped from the Web, and then use it to fulfill a broad range of user requests. Large language models (LLMs) can answer questions, write code, and spout poetry, while image-generating systems can create convincing cave paintings or contemporary art.
So why haven’t these amazing AI capabilities translated into the kinds of helpful and broadly useful robots we’ve seen in science fiction? Where are the robots that can clean off the table, fold your laundry, and make you breakfast?
Unfortunately, the highly successful generative AI formula—big models trained on lots of Internet-sourced data—doesn’t easily carry over into robotics, because the Internet is not full of robotic-interaction data in the same way that it’s full of text and images. Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks. Despite tremendous progress on robot-learning algorithms, without abundant data we still can’t enable robots to perform real-world tasks (like making breakfast) outside the lab. The most impressive results typically only work in a single laboratory, on a single robot, and often involve only a handful of behaviors.
If the abilities of each robot are limited by the time and effort it takes to manually teach it to perform a new task, what if we were to pool together the experiences of many robots, so a new robot could learn from all of them at once? We decided to give it a try. In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality.
Here is what we learned from the first phase of this effort.
How to create a generalist robotHumans are far better at this kind of learning. Our brains can, with a little practice, handle what are essentially changes to our body plan, which happens when we pick up a tool, ride a bicycle, or get in a car. That is, our “embodiment” changes, but our brains adapt. RT-X is aiming for something similar in robots: to enable a single deep neural network to control many different types of robots, a capability called cross-embodiment. The question is whether a deep neural network trained on data from a sufficiently large number of different robots can learn to “drive” all of them—even robots with very different appearances, physical properties, and capabilities. If so, this approach could potentially unlock the power of large datasets for robotic learning.
The scale of this project is very large because it has to be. The RT-X dataset currently contains nearly a million robotic trials for 22 types of robots, including many of the most commonly used robotic arms on the market. The robots in this dataset perform a huge range of behaviors, including picking and placing objects, assembly, and specialized tasks like cable routing. In total, there are about 500 different skills and interactions with thousands of different objects. It’s the largest open-source dataset of real robotic actions in existence.
Surprisingly, we found that our multirobot data could be used with relatively simple machine-learning methods, provided that we follow the recipe of using large neural-network models with large datasets. Leveraging the same kinds of models used in current LLMs like ChatGPT, we were able to train robot-control algorithms that do not require any special features for cross-embodiment. Much like a person can drive a car or ride a bicycle using the same brain, a model trained on the RT-X dataset can simply recognize what kind of robot it’s controlling from what it sees in the robot’s own camera observations. If the robot’s camera sees a UR10 industrial arm, the model sends commands appropriate to a UR10. If the model instead sees a low-cost WidowX hobbyist arm, the model moves it accordingly.
To test the capabilities of our model, five of the laboratories involved in the RT-X collaboration each tested it in a head-to-head comparison against the best control system they had developed independently for their own robot. Each lab’s test involved the tasks it was using for its own research, which included things like picking up and moving objects, opening doors, and routing cables through clips. Remarkably, the single unified model provided improved performance over each laboratory’s own best method, succeeding at the tasks about 50 percent more often on average.
While this result might seem surprising, we found that the RT-X controller could leverage the diverse experiences of other robots to improve robustness in different settings. Even within the same laboratory, every time a robot attempts a task, it finds itself in a slightly different situation, and so drawing on the experiences of other robots in other situations helped the RT-X controller with natural variability and edge cases. Here are a few examples of the range of these tasks:
Building robots that can reason
Encouraged by our success with combining data from many robot types, we next sought to investigate how such data can be incorporated into a system with more in-depth reasoning capabilities. Complex semantic reasoning is hard to learn from robot data alone. While the robot data can provide a range of physical capabilities, more complex tasks like “Move apple between can and orange” also require understanding the semantic relationships between objects in an image, basic common sense, and other symbolic knowledge that is not directly related to the robot’s physical capabilities.
So we decided to add another massive source of data to the mix: Internet-scale image and text data. We used an existing large vision-language model that is already proficient at many tasks that require some understanding of the connection between natural language and images. The model is similar to the ones available to the public such as ChatGPT or Bard. These models are trained to output text in response to prompts containing images, allowing them to solve problems such as visual question-answering, captioning, and other open-ended visual understanding tasks. We discovered that such models can be adapted to robotic control simply by training them to also output robot actions in response to prompts framed as robotic commands (such as “Put the banana on the plate”). We applied this approach to the robotics data from the RT-X collaboration.
The RT-X model uses images or text descriptions of specific robot arms doing different tasks to output a series of discrete actions that will allow any robot arm to do those tasks. By collecting data from many robots doing many tasks from robotics labs around the world, we are building an open-source dataset that can be used to teach robots to be generally useful.Chris Philpot
To evaluate the combination of Internet-acquired smarts and multirobot data, we tested our RT-X model with Google’s mobile manipulator robot. We gave it our hardest generalization benchmark tests. The robot had to recognize objects and successfully manipulate them, and it also had to respond to complex text commands by making logical inferences that required integrating information from both text and images. The latter is one of the things that make humans such good generalists. Could we give our robots at least a hint of such capabilities?
Even without specific training, this Google research robot is able to follow the instruction “move apple between can and orange.” This capability is enabled by RT-X, a large robotic manipulation dataset and the first step towards a general robotic brain.
We conducted two sets of evaluations. As a baseline, we used a model that excluded all of the generalized multirobot RT-X data that didn’t involve Google’s robot. Google’s robot-specific dataset is in fact the largest part of the RT-X dataset, with over 100,000 demonstrations, so the question of whether all the other multirobot data would actually help in this case was very much open. Then we tried again with all that multirobot data included.
In one of the most difficult evaluation scenarios, the Google robot needed to accomplish a task that involved reasoning about spatial relations (“Move apple between can and orange”); in another task it had to solve rudimentary math problems (“Place an object on top of a paper with the solution to ‘2+3’”). These challenges were meant to test the crucial capabilities of reasoning and drawing conclusions.
In this case, the reasoning capabilities (such as the meaning of “between” and “on top of”) came from the Web-scale data included in the training of the vision-language model, while the ability to ground the reasoning outputs in robotic behaviors—commands that actually moved the robot arm in the right direction—came from training on cross-embodiment robot data from RT-X. Some examples of evaluations where we asked the robots to perform tasks not included in their training data are shown below.While these tasks are rudimentary for humans, they present a major challenge for general-purpose robots. Without robotic demonstration data that clearly illustrates concepts like “between,” “near,” and “on top of,” even a system trained on data from many different robots would not be able to figure out what these commands mean. By integrating Web-scale knowledge from the vision-language model, our complete system was able to solve such tasks, deriving the semantic concepts (in this case, spatial relations) from Internet-scale training, and the physical behaviors (picking up and moving objects) from multirobot RT-X data. To our surprise, we found that the inclusion of the multirobot data improved the Google robot’s ability to generalize to such tasks by a factor of three. This result suggests that not only was the multirobot RT-X data useful for acquiring a variety of physical skills, it could also help to better connect such skills to the semantic and symbolic knowledge in vision-language models. These connections give the robot a degree of common sense, which could one day enable robots to understand the meaning of complex and nuanced user commands like “Bring me my breakfast” while carrying out the actions to make it happen.
The next steps for RT-XThe RT-X project shows what is possible when the robot-learning community acts together. Because of this cross-institutional effort, we were able to put together a diverse robotic dataset and carry out comprehensive multirobot evaluations that wouldn’t be possible at any single institution. Since the robotics community can’t rely on scraping the Internet for training data, we need to create that data ourselves. We hope that more researchers will contribute their data to the RT-X database and join this collaborative effort. We also hope to provide tools, models, and infrastructure to support cross-embodiment research. We plan to go beyond sharing data across labs, and we hope that RT-X will grow into a collaborative effort to develop data standards, reusable models, and new techniques and algorithms.
Our early results hint at how large cross-embodiment robotics models could transform the field. Much as large language models have mastered a wide range of language-based tasks, in the future we might use the same foundation model as the basis for many real-world robotic tasks. Perhaps new robotic skills could be enabled by fine-tuning or even prompting a pretrained foundation model. In a similar way to how you can prompt ChatGPT to tell a story without first training it on that particular story, you could ask a robot to write “Happy Birthday” on a cake without having to tell it how to use a piping bag or what handwritten text looks like. Of course, much more research is needed for these models to take on that kind of general capability, as our experiments have focused on single arms with two-finger grippers doing simple manipulation tasks.
As more labs engage in cross-embodiment research, we hope to further push the frontier on what is possible with a single neural network that can control many robots. These advances might include adding diverse simulated data from generated environments, handling robots with different numbers of arms or fingers, using different sensor suites (such as depth cameras and tactile sensing), and even combining manipulation and locomotion behaviors. RT-X has opened the door for such work, but the most exciting technical developments are still ahead.
This is just the beginning. We hope that with this first step, we can together create the future of robotics: where general robotic brains can power any robot, benefiting from data shared by all robots around the world.
The generative AI revolution embodied in tools like ChatGPT, Midjourney, and many others is at its core based on a simple formula: Take a very large neural network, train it on a huge dataset scraped from the Web, and then use it to fulfill a broad range of user requests. Large language models (LLMs) can answer questions, write code, and spout poetry, while image-generating systems can create convincing cave paintings or contemporary art.
So why haven’t these amazing AI capabilities translated into the kinds of helpful and broadly useful robots we’ve seen in science fiction? Where are the robots that can clean off the table, fold your laundry, and make you breakfast?
Unfortunately, the highly successful generative AI formula—big models trained on lots of Internet-sourced data—doesn’t easily carry over into robotics, because the Internet is not full of robotic-interaction data in the same way that it’s full of text and images. Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks. Despite tremendous progress on robot-learning algorithms, without abundant data we still can’t enable robots to perform real-world tasks (like making breakfast) outside the lab. The most impressive results typically only work in a single laboratory, on a single robot, and often involve only a handful of behaviors.
If the abilities of each robot are limited by the time and effort it takes to manually teach it to perform a new task, what if we were to pool together the experiences of many robots, so a new robot could learn from all of them at once? We decided to give it a try. In 2023, our labs at Google and the University of California, Berkeley came together with 32 other robotics laboratories in North America, Europe, and Asia to undertake the RT-X project, with the goal of assembling data, resources, and code to make general-purpose robots a reality.
Here is what we learned from the first phase of this effort.
How to create a generalist robotHumans are far better at this kind of learning. Our brains can, with a little practice, handle what are essentially changes to our body plan, which happens when we pick up a tool, ride a bicycle, or get in a car. That is, our “embodiment” changes, but our brains adapt. RT-X is aiming for something similar in robots: to enable a single deep neural network to control many different types of robots, a capability called cross-embodiment. The question is whether a deep neural network trained on data from a sufficiently large number of different robots can learn to “drive” all of them—even robots with very different appearances, physical properties, and capabilities. If so, this approach could potentially unlock the power of large datasets for robotic learning.
The scale of this project is very large because it has to be. The RT-X dataset currently contains nearly a million robotic trials for 22 types of robots, including many of the most commonly used robotic arms on the market. The robots in this dataset perform a huge range of behaviors, including picking and placing objects, assembly, and specialized tasks like cable routing. In total, there are about 500 different skills and interactions with thousands of different objects. It’s the largest open-source dataset of real robotic actions in existence.
Surprisingly, we found that our multirobot data could be used with relatively simple machine-learning methods, provided that we follow the recipe of using large neural-network models with large datasets. Leveraging the same kinds of models used in current LLMs like ChatGPT, we were able to train robot-control algorithms that do not require any special features for cross-embodiment. Much like a person can drive a car or ride a bicycle using the same brain, a model trained on the RT-X dataset can simply recognize what kind of robot it’s controlling from what it sees in the robot’s own camera observations. If the robot’s camera sees a UR10 industrial arm, the model sends commands appropriate to a UR10. If the model instead sees a low-cost WidowX hobbyist arm, the model moves it accordingly.
To test the capabilities of our model, five of the laboratories involved in the RT-X collaboration each tested it in a head-to-head comparison against the best control system they had developed independently for their own robot. Each lab’s test involved the tasks it was using for its own research, which included things like picking up and moving objects, opening doors, and routing cables through clips. Remarkably, the single unified model provided improved performance over each laboratory’s own best method, succeeding at the tasks about 50 percent more often on average.
While this result might seem surprising, we found that the RT-X controller could leverage the diverse experiences of other robots to improve robustness in different settings. Even within the same laboratory, every time a robot attempts a task, it finds itself in a slightly different situation, and so drawing on the experiences of other robots in other situations helped the RT-X controller with natural variability and edge cases. Here are a few examples of the range of these tasks:
Building robots that can reason
Encouraged by our success with combining data from many robot types, we next sought to investigate how such data can be incorporated into a system with more in-depth reasoning capabilities. Complex semantic reasoning is hard to learn from robot data alone. While the robot data can provide a range of physical capabilities, more complex tasks like “Move apple between can and orange” also require understanding the semantic relationships between objects in an image, basic common sense, and other symbolic knowledge that is not directly related to the robot’s physical capabilities.
So we decided to add another massive source of data to the mix: Internet-scale image and text data. We used an existing large vision-language model that is already proficient at many tasks that require some understanding of the connection between natural language and images. The model is similar to the ones available to the public such as ChatGPT or Bard. These models are trained to output text in response to prompts containing images, allowing them to solve problems such as visual question-answering, captioning, and other open-ended visual understanding tasks. We discovered that such models can be adapted to robotic control simply by training them to also output robot actions in response to prompts framed as robotic commands (such as “Put the banana on the plate”). We applied this approach to the robotics data from the RT-X collaboration.
The RT-X model uses images or text descriptions of specific robot arms doing different tasks to output a series of discrete actions that will allow any robot arm to do those tasks. By collecting data from many robots doing many tasks from robotics labs around the world, we are building an open-source dataset that can be used to teach robots to be generally useful.Chris Philpot
To evaluate the combination of Internet-acquired smarts and multirobot data, we tested our RT-X model with Google’s mobile manipulator robot. We gave it our hardest generalization benchmark tests. The robot had to recognize objects and successfully manipulate them, and it also had to respond to complex text commands by making logical inferences that required integrating information from both text and images. The latter is one of the things that make humans such good generalists. Could we give our robots at least a hint of such capabilities?
Even without specific training, this Google research robot is able to follow the instruction “move apple between can and orange.” This capability is enabled by RT-X, a large robotic manipulation dataset and the first step towards a general robotic brain.
We conducted two sets of evaluations. As a baseline, we used a model that excluded all of the generalized multirobot RT-X data that didn’t involve Google’s robot. Google’s robot-specific dataset is in fact the largest part of the RT-X dataset, with over 100,000 demonstrations, so the question of whether all the other multirobot data would actually help in this case was very much open. Then we tried again with all that multirobot data included.
In one of the most difficult evaluation scenarios, the Google robot needed to accomplish a task that involved reasoning about spatial relations (“Move apple between can and orange”); in another task it had to solve rudimentary math problems (“Place an object on top of a paper with the solution to ‘2+3’”). These challenges were meant to test the crucial capabilities of reasoning and drawing conclusions.
In this case, the reasoning capabilities (such as the meaning of “between” and “on top of”) came from the Web-scale data included in the training of the vision-language model, while the ability to ground the reasoning outputs in robotic behaviors—commands that actually moved the robot arm in the right direction—came from training on cross-embodiment robot data from RT-X. Some examples of evaluations where we asked the robots to perform tasks not included in their training data are shown below.While these tasks are rudimentary for humans, they present a major challenge for general-purpose robots. Without robotic demonstration data that clearly illustrates concepts like “between,” “near,” and “on top of,” even a system trained on data from many different robots would not be able to figure out what these commands mean. By integrating Web-scale knowledge from the vision-language model, our complete system was able to solve such tasks, deriving the semantic concepts (in this case, spatial relations) from Internet-scale training, and the physical behaviors (picking up and moving objects) from multirobot RT-X data. To our surprise, we found that the inclusion of the multirobot data improved the Google robot’s ability to generalize to such tasks by a factor of three. This result suggests that not only was the multirobot RT-X data useful for acquiring a variety of physical skills, it could also help to better connect such skills to the semantic and symbolic knowledge in vision-language models. These connections give the robot a degree of common sense, which could one day enable robots to understand the meaning of complex and nuanced user commands like “Bring me my breakfast” while carrying out the actions to make it happen.
The next steps for RT-XThe RT-X project shows what is possible when the robot-learning community acts together. Because of this cross-institutional effort, we were able to put together a diverse robotic dataset and carry out comprehensive multirobot evaluations that wouldn’t be possible at any single institution. Since the robotics community can’t rely on scraping the Internet for training data, we need to create that data ourselves. We hope that more researchers will contribute their data to the RT-X database and join this collaborative effort. We also hope to provide tools, models, and infrastructure to support cross-embodiment research. We plan to go beyond sharing data across labs, and we hope that RT-X will grow into a collaborative effort to develop data standards, reusable models, and new techniques and algorithms.
Our early results hint at how large cross-embodiment robotics models could transform the field. Much as large language models have mastered a wide range of language-based tasks, in the future we might use the same foundation model as the basis for many real-world robotic tasks. Perhaps new robotic skills could be enabled by fine-tuning or even prompting a pretrained foundation model. In a similar way to how you can prompt ChatGPT to tell a story without first training it on that particular story, you could ask a robot to write “Happy Birthday” on a cake without having to tell it how to use a piping bag or what handwritten text looks like. Of course, much more research is needed for these models to take on that kind of general capability, as our experiments have focused on single arms with two-finger grippers doing simple manipulation tasks.
As more labs engage in cross-embodiment research, we hope to further push the frontier on what is possible with a single neural network that can control many robots. These advances might include adding diverse simulated data from generated environments, handling robots with different numbers of arms or fingers, using different sensor suites (such as depth cameras and tactile sensing), and even combining manipulation and locomotion behaviors. RT-X has opened the door for such work, but the most exciting technical developments are still ahead.
This is just the beginning. We hope that with this first step, we can together create the future of robotics: where general robotic brains can power any robot, benefiting from data shared by all robots around the world.
Musculoskeletal models provide an approach towards simulating the ability of the human body in a variety of human-robot applications. A promising use for musculoskeletal models is to model the physical capabilities of the human body, for example, estimating the strength at the hand. Several methods of modelling and representing human strength with musculoskeletal models have been used in ergonomic analysis, human-robot interaction and robotic assistance. However, it is currently unclear which methods best suit modelling and representing limb strength. This paper compares existing methods for calculating and representing the strength of the upper limb using musculoskeletal models. It then details the differences and relative advantages of the existing methods, enabling the discussion on the appropriateness of each method for particular applications.
Introduction: There has been a surge in the use of social robots for providing information, persuasion, and entertainment in noisy public spaces in recent years. Considering the well-documented negative effect of noise on human cognition, masking sounds have been introduced. Masking sounds work, in principle, by making the intrusive background speeches less intelligible, and hence, less distracting. However, this reduced distraction comes with the cost of increasing annoyance and reduced cognitive performance in the users of masking sounds.
Methods: In a previous study, it was shown that reducing the fundamental frequency of the speech-shaped noise as a masking sound significantly contributes to its being less annoying and more efficient. In this study, the effectiveness of the proposed masking sound was tested on the performance of subjects listening to a lecture given by a social robot in a noisy cocktail party environment.
Results: The results indicate that the presence of the masking sound significantly increased speech comprehension, perceived understandability, acoustic satisfaction, and sound privacy of the individuals listening to the robot in an adverse listening condition.
Discussion: To the knowledge of the authors, no previous work has investigated the application of sound masking technology in human-robot interaction designs. The future directions of this trend are discussed.
In current telerobotics and telemanipulator applications, operators must perform a wide variety of tasks, often with a high risk associated with failure. A system designed to generate data-based behavioural estimations using observed operator features could be used to reduce risks in industrial teleoperation. This paper describes a non-invasive bio-mechanical feature capture method for teleoperators used to trial novel human-error rate estimators which, in future work, are intended to improve operational safety by providing behavioural and postural feedback to the operator. Operator monitoring studies were conducted in situ using the MASCOT teleoperation system at UKAEA RACE; the operators were given controlled tasks to complete during observation. Building upon existing works for vehicle-driver intention estimation and robotic surgery operator analysis, we used 3D point-cloud data capture using a commercially available depth camera to estimate an operator’s skeletal pose. A total of 14 operators were observed and recorded for a total of approximately 8 h, each completing a baseline task and a task designed to induce detectable but safe collisions. Skeletal pose was estimated, collision statistics were recorded, and questionnaire-based psychological assessments were made, providing a database of qualitative and quantitative data. We then trialled data-driven analysis by using statistical and machine learning regression techniques (SVR) to estimate collision rates. We further perform and present an input variable sensitivity analysis for our selected features.
Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.
Enjoy today’s videos!
One approach to robot autonomy is to learn from human demonstration, which can be very effective as long as you have enough high quality data to work with. Mobile ALOHA is a low-cost and whole-body teleoperation system for data collection from Stanford’s IRIS Lab, and under the control of an experienced human, it can do pretty much everything we’ve ever fantasized about home robots doing for us.
[ Stanford ]
Researchers at SEAS and the BU’s Sargent College of Health & Rehabilitation Sciences used a soft, wearable robot to help a person living with Parkinson’s walk without freezing. The robotic garment, worn around the hips and thighs, gives a gentle push to the hips as the leg swings, helping the patient achieve a longer stride. The research demonstrates the potential of soft robotics to treat a potentially dangerous symptom of Parkinson’s disease and could allow people living with the disease to regain their mobility and independence.[ Harvard SEAS ]
Happy 2024 from SkyMul!
[ SkyMul ]
Thanks, Eohan!
As the holiday season approaches, we at Kawasaki Robotics (USA), Inc. wanted to take a moment to express our warmest wishes to you. May your holidays be filled with joy, love, and peace, and may the New Year bring you prosperity, success, and happiness. From our team to yours, we wish you a very happy holiday season and a wonderful New Year ahead.Aurora Flight Sciences is working on a new X-plane for the Defense Advanced Research Projects Agency’s (DARPA) Control of Revolutionary Aircraft with Novel Effectors (CRANE) program. X-65 is purpose-designed for testing and demonstrating the benefits of active flow control (AFC) at tactically relevant scale and flight conditions.[ Aurora ]
Well, this is the craziest piece of immersive robotic teleop hardware I’ve ever seen.
[ Jinkisha ]
Looks like Moley Robotics is still working on the least practical robotic kitchen ever.
[ Moley ]
Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion.
Enjoy today’s videos!
One approach to robot autonomy is to learn from human demonstration, which can be very effective as long as you have enough high quality data to work with. Mobile ALOHA is a low-cost and whole-body teleoperation system for data collection from Stanford’s IRIS Lab, and under the control of an experienced human, it can do pretty much everything we’ve ever fantasized about home robots doing for us.
[ Stanford ]
Researchers at SEAS and the BU’s Sargent College of Health & Rehabilitation Sciences used a soft, wearable robot to help a person living with Parkinson’s walk without freezing. The robotic garment, worn around the hips and thighs, gives a gentle push to the hips as the leg swings, helping the patient achieve a longer stride. The research demonstrates the potential of soft robotics to treat a potentially dangerous symptom of Parkinson’s disease and could allow people living with the disease to regain their mobility and independence.[ Harvard SEAS ]
Happy 2024 from SkyMul!
[ SkyMul ]
Thanks, Eohan!
As the holiday season approaches, we at Kawasaki Robotics (USA), Inc. wanted to take a moment to express our warmest wishes to you. May your holidays be filled with joy, love, and peace, and may the New Year bring you prosperity, success, and happiness. From our team to yours, we wish you a very happy holiday season and a wonderful New Year ahead.Aurora Flight Sciences is working on a new X-plane for the Defense Advanced Research Projects Agency’s (DARPA) Control of Revolutionary Aircraft with Novel Effectors (CRANE) program. X-65 is purpose-designed for testing and demonstrating the benefits of active flow control (AFC) at tactically relevant scale and flight conditions.[ Aurora ]
Well, this is the craziest piece of immersive robotic teleop hardware I’ve ever seen.
[ Jinkisha ]
Looks like Moley Robotics is still working on the least practical robotic kitchen ever.
[ Moley ]
Introduction: Preventive control is a critical feature in autonomous technology to ensure safe system operations. One application where safety is most important is robot-assisted needle interventions. During incisions into a tissue, adverse events such as mechanical buckling of the needle shaft and tissue displacements can occur on encounter with stiff membranes causing potential damage to the organ.
Methods: To prevent these events before they occur, we propose a new control subroutine that autonomously chooses a) a reactive mechanism to stop the insertion procedure when a needle buckling or a severe tissue displacement event is predicted and b) an adaptive mechanism to continue the insertion procedure through needle steering control when a mild tissue displacement is detected. The subroutine is developed using a model-free control technique due to the nonlinearities of the unknown needle-tissue dynamics. First, an improved version of the model-free adaptive control (IMFAC) is developed by computing a fast time-varying partial pseudo derivative analytically from the dynamic linearization equation to enhance output convergence and robustness against external disturbances.
Results and Discussion: Comparing IMFAC and MFAC algorithms on simulated nonlinear systems in MATLAB, IMFAC shows 20% faster output convergence against arbitrary disturbances. Next, IMFAC is integrated with event prediction algorithms from prior work to prevent adverse events during needle insertions in real time. Needle insertions in gelatin tissues with known environments show successful prevention of needle buckling and tissue displacement events. Needle insertions in biological tissues with unknown environments are performed using live fluoroscopic imaging as ground truth to verify timely prevention of adverse events. Finally, statistical ANOVA analysis on all insertion data shows the robustness of the prevention algorithm to various needles and tissue environments. Overall, the success rate of preventing adverse events in needle insertions through adaptive and reactive control was 95%, which is important toward achieving safety in robotic needle interventions.
Introduction: Human–robot teams are being called upon to accomplish increasingly complex tasks. During execution, the robot may operate at different levels of autonomy (LOAs), ranging from full robotic autonomy to full human control. For any number of reasons, such as changes in the robot’s surroundings due to the complexities of operating in dynamic and uncertain environments, degradation and damage to the robot platform, or changes in tasking, adjusting the LOA during operations may be necessary to achieve desired mission outcomes. Thus, a critical challenge is understanding when and how the autonomy should be adjusted.
Methods: We frame this problem with respect to the robot’s capabilities and limitations, known as robot competency. With this framing, a robot could be granted a level of autonomy in line with its ability to operate with a high degree of competence. First, we propose a Model Quality Assessment metric, which indicates how (un)expected an autonomous robot’s observations are compared to its model predictions. Next, we present an Event-Triggered Generalized Outcome Assessment (ET-GOA) algorithm that uses changes in the Model Quality Assessment above a threshold to selectively execute and report a high-level assessment of the robot’s competency. We validated the Model Quality Assessment metric and the ET-GOA algorithm in both simulated and live robot navigation scenarios.
Results: Our experiments found that the Model Quality Assessment was able to respond to unexpected observations. Additionally, our validation of the full ET-GOA algorithm explored how the computational cost and accuracy of the algorithm was impacted across several Model Quality triggering thresholds and with differing amounts of state perturbations.
Discussion: Our experimental results combined with a human-in-the-loop demonstration show that Event-Triggered Generalized Outcome Assessment algorithm can facilitate informed autonomy-adjustment decisions based on a robot’s task competency.
Soft pneumatic artificial muscles are a well actuation scheme in soft robotics due to its key features for robotic machines being safe, lightweight, and conformable. In this work, we present a versatile vacuum-powered artificial muscle (VPAM) with manually tunable output motion. We developed an artificial muscle that consists of a stack of air chambers that can use replaceable external reinforcements. Different modes of operation are achieved by assembling different reinforcements that constrain the output motion of the actuator during actuation. We designed replaceable external reinforcements to produce single motions such as twisting, bending, shearing and rotary. We then conducted a deformation and lifting force characterization for these motions. We demonstrated sophisticated motions and reusability of the artificial muscle in two soft machines with different modes of locomotion. Our results show that our VPAM is reusable and versatile producing a variety and sophisticated output motions if needed. This key feature specially benefits unpredicted workspaces that require a soft actuator that can be adjusted for other tasks. Our scheme has the potential to offer new strategies for locomotion in machines for underwater or terrestrial operation, and wearable devices with different modes of operation.
In control theory, reactive methods have been widely celebrated owing to their success in providing robust, provably convergent solutions to control problems. Even though such methods have long been formulated for motion planning, optimality has largely been left untreated through reactive means, with the community focusing on discrete/graph-based solutions. Although the latter exhibit certain advantages (completeness, complicated state-spaces), the recent rise in Reinforcement Learning (RL), provides novel ways to address the limitations of reactive methods. The goal of this paper is to treat the reactive optimal motion planning problem through an RL framework. A policy iteration RL scheme is formulated in a consistent manner with the control-theoretic results, thus utilizing the advantages of each approach in a complementary way; RL is employed to construct the optimal input without necessitating the solution of a hard, non-linear partial differential equation. Conversely, safety, convergence and policy improvement are guaranteed through control theoretic arguments. The proposed method is validated in simulated synthetic workspaces, and compared against reactive methods as well as a PRM and an RRT⋆ approach. The proposed method outperforms or closely matches the latter methods, indicating the near global optimality of the former, while providing a solution for planning from anywhere within the workspace to the goal position.
As IEEE Spectrum editors, we pride ourselves on spotting promising technologies and following them from the research phase through development and ultimately deployment. In every January issue, we focus on the technologies that are now poised to achieve significant milestones in the new year.
This issue was curated by Senior Editor Samuel K. Moore, our in-house expert on semiconductors. So it’s no surprise that he included a story on Intel’s plan to roll out two momentous chip technologies in the next few months.
For “Intel Hopes to Leapfrog Its Competitors,” Moore directed our editorial intern, Gwendolyn Rak, to report on the risk the chip giant is taking by introducing two technologies at once. We began tracking the first technology, nanosheet transistors, in 2017. By the time we gave all the details in a 2019 feature article, it was clear that this device was destined to be the successor to the FinFET. Moore first spotted the second technology, back-side power delivery, at the IEEE International Electron Devices Meeting in 2019. Less than two years later, Intel publicly committed to incorporating the tech in 2024.
Speaking of commitment, the U.S. military’s Defense Advanced Research Projects Agency has played an enormous part in bankrolling some of the fundamental advances that appear in these pages. Many of our readers will be familiar with the robots that Senior Editor Evan Ackerman covered during DARPA’s humanoid challenge almost 10 years ago. Those robots were essentially research projects, but as Ackerman reports in “Year of the Humanoid,” a few companies will start up pilot projects in 2024 to see if this generation of humanoids is ready to roll up its metaphorical sleeves and get down to business.
More recently, fully homomorphic encryption (FHE) has burst onto the scene. Moore, who’s been covering the Cambrian explosion in chip architectures for AI and other alternative computing modalities since the mid-teens, notes that, like the robotics challenge, DARPA was the initial driver.
“You’d expect the three companies DARPA funded to come up with a chip, though there was no guarantee they’d commercialize it,” says Moore, who wrote “Chips to Compute With Encrypted Data Are Coming.” “But what you wouldn’t expect is three more startups, independently of DARPA, to come out with their own FHE chips at the same time.”
Senior Editor Tekla S. Perry’s story about phosphorescent OLEDs, “A Behind-the-Screens Change for OLED,” is actually a deep cut for us. One of the first feature articles Moore edited at Spectrum way back in 2000 was Stephen Forrest’s article on organic electronics. His lab developed the first phosphorescent OLED materials, which are hugely more efficient than the fluorescent ones. Forrest was a founder of Universal Display Corp., which has now, after more than two decades, finally commercialized the last of its trio of phosphorescent colors—blue.
Then there’s our cover story about deepfakes and their potential impact on dozens of national elections later this year. We’ve been tracking the rise of deepfakes since mid-2018, when we ran a story about AI researchers betting on whether or not a deepfake video about a political candidate would receive more than 2 million views during the U.S. midterm elections that year. As Senior Editor Eliza Strickland reports in “This Election Year, Look for Content Credentials,” several companies and industry groups are working hard to ensure that deepfakes don’t take down democracy.
Best wishes for a healthy and prosperous new year, and enjoy this year’s technology forecast. It’s been years in the making.
This article appears in the January 2024 print issue.
As IEEE Spectrum editors, we pride ourselves on spotting promising technologies and following them from the research phase through development and ultimately deployment. In every January issue, we focus on the technologies that are now poised to achieve significant milestones in the new year.
This issue was curated by Senior Editor Samuel K. Moore, our in-house expert on semiconductors. So it’s no surprise that he included a story on Intel’s plan to roll out two momentous chip technologies in the next few months.
For “Intel Hopes to Leapfrog Its Competitors,” Moore directed our editorial intern, Gwendolyn Rak, to report on the risk the chip giant is taking by introducing two technologies at once. We began tracking the first technology, nanosheet transistors, in 2017. By the time we gave all the details in a 2019 feature article, it was clear that this device was destined to be the successor to the FinFET. Moore first spotted the second technology, back-side power delivery, at the IEEE International Electron Devices Meeting in 2019. Less than two years later, Intel publicly committed to incorporating the tech in 2024.
Speaking of commitment, the U.S. military’s Defense Advanced Research Projects Agency has played an enormous part in bankrolling some of the fundamental advances that appear in these pages. Many of our readers will be familiar with the robots that Senior Editor Evan Ackerman covered during DARPA’s humanoid challenge almost 10 years ago. Those robots were essentially research projects, but as Ackerman reports in “Year of the Humanoid,” a few companies will start up pilot projects in 2024 to see if this generation of humanoids is ready to roll up its metaphorical sleeves and get down to business.
More recently, fully homomorphic encryption (FHE) has burst onto the scene. Moore, who’s been covering the Cambrian explosion in chip architectures for AI and other alternative computing modalities since the mid-teens, notes that, like the robotics challenge, DARPA was the initial driver.
“You’d expect the three companies DARPA funded to come up with a chip, though there was no guarantee they’d commercialize it,” says Moore, who wrote “Chips to Compute With Encrypted Data Are Coming.” “But what you wouldn’t expect is three more startups, independently of DARPA, to come out with their own FHE chips at the same time.”
Senior Editor Tekla S. Perry’s story about phosphorescent OLEDs, “A Behind-the-Screens Change for OLED,” is actually a deep cut for us. One of the first feature articles Moore edited at Spectrum way back in 2000 was Stephen Forrest’s article on organic electronics. His lab developed the first phosphorescent OLED materials, which are hugely more efficient than the fluorescent ones. Forrest was a founder of Universal Display Corp., which has now, after more than two decades, finally commercialized the last of its trio of phosphorescent colors—blue.
Then there’s our cover story about deepfakes and their potential impact on dozens of national elections later this year. We’ve been tracking the rise of deepfakes since mid-2018, when we ran a story about AI researchers betting on whether or not a deepfake video about a political candidate would receive more than 2 million views during the U.S. midterm elections that year. As Senior Editor Eliza Strickland reports in “This Election Year, Look for Content Credentials,” several companies and industry groups are working hard to ensure that deepfakes don’t take down democracy.
Best wishes for a healthy and prosperous new year, and enjoy this year’s technology forecast. It’s been years in the making.
This article appears in the January 2024 print issue.
This story is part of our Top Tech 2024 special report.
Journey to the Center of the EarthTo unlock the terawatt potential of geothermal energy, MIT startup Quaise Energy is testing a deep-drilling rig in 2024 that will use high-power millimeter waves to melt a column of rock down as far as 10 to 20 kilometers. Its “deeper, hotter, and faster” strategy will start with old oil-and-gas drilling structures and extend them by blasting radiation from a gyrotron to vaporize the hard rock beneath. At these depths, Earth reaches 500 °C. Accessing this superhot geothermal energy could be a key part of achieving net zero emission goals by 2050, according to Quaise executives.
“Batteries Included” Induction Ovens
Now we’re cooking with gas—but soon, we may be cooking with induction. A growing number of consumers are switching to induction-based stoves and ovens to address environmental concerns and health risks associated with gas ranges. But while these new appliances are more energy efficient, most models require modified electrical outlets and cost hundreds of dollars to install. That’s why startups like Channing Street Copper and Impulse Labs are working to make induction ovens easier to install by adding built-in batteries that supplement regular wall-socket power. Channing Street Copper plans to roll out its battery-boosted Charlie appliance in early 2024.
Triage Tech to the Rescue
In the second half of 2024, the U.S. Defense Advanced Research Projects Agency will begin the first round of its Triage Challenge, a competition to develop sensors and algorithms to support triage efforts during mass-casualty incidents. According to a DARPA video presentation from last February, the agency is seeking new ways to help medics at two stages of treatment: During primary triage, those most in need of care will be identified with sensors from afar. Then, when the patients are stable, medics can decide the best treatment regimens based on data gleaned from noninvasive sensors. The three rounds will continue through 2026, with prizes totaling US $7 million.
Killer Drones Deployed From the Skies
A new class of missile-firing drones will take to the skies in 2024. Like a three-layer aerial nesting doll, the missile-stuffed drone is itself released from the belly of a bomber while in flight. The uncrewed aircraft was developed by energy and defense company General Atomics as part of the Defense Advanced Research Projects Agency’s LongShot program and will be flight-tested this year to prove its feasibility in air-based combat. Its goal is to extend the range and effectiveness of both air-to-air missiles and the current class of fighter jets while new aircraft are introduced.
Visible’s Anti-Activity Tracker
Long COVID and chronic fatigue often go unseen by others. But it’s important that people with these invisible illnesses understand how different activities affect their symptoms so they can properly pace their days. That’s why one man with long COVID, Harry Leeming, decided to create Visible, an app that helps users monitor activity and avoid overexertion. This year, according to Leeming, Visible will launch a premium version of the app that uses a specialized heart-rate monitor. While most wearables are meant for workouts, Leeming says, these armband monitors are optimized for lower heart rates to help people with both long COVID and fatigue. The app will also collect data from consenting users to help research these conditions.
Amazon Launches New Internet Service—Literally
Amazon expects to begin providing Internet service from space with Project Kuiper by the end of 2024. The US $10 billion project aims to expand reliable broadband internet access to rural areas around the globe by launching a constellation of more than 3,000 satellites into low Earth orbit. While the project will take years to complete in full, Amazon is set to start beta testing with customers later this year. If successful, Kuiper could be integrated into the suite of Amazon Web Services. SpaceX’s Starlink, meanwhile, has been active since 2019 and already has 5,000 satellites in orbit.
Solar-Powered Test Drive
The next car you buy might be powered by the sun. Long awaited by potential customers and crowdfunders, solar electric vehicles (SEVs) made by the startup Aptera Motors are set to hit the road in 2024, the company says. Like the cooler cousin of an SUV, these three-wheeled SEVs feature a sleek, aerodynamic design to cut down on drag. The latest version of the vehicle combines plug-in capability with solar panels that cover its roof, allowing for a 1,600-kilometer range on a single charge and up to 65 km a day from solar power. Aptera says it aims to begin early production in 2024, with the first 2,000 vehicles set to be delivered to investors.
Zero Trust, Two-Thirds Confidence
“Trust but verify” is now a proverb of the past in cybersecurity policy in the United States. By the end of the 2024 fiscal year, in September, all U.S. government agencies will be required to switch to a Zero Trust security architecture. All users must validate their identity and devices—even when they’re already connected to government networks and VPNs. This is achieved with methods like multifactor authentication and other access controls. About two-thirds of security professionals employed by federal agencies are confident that their department will hit the cybersecurity deadline, according to a 2023 report.
First Light for Vera Rubin
Vera C. Rubin Observatory, home to the largest digital camera ever constructed, is expected to open its eye to the sky for the first time in late 2024. The observatory features an 8.4-meter wide-field telescope that will scan the Southern Hemisphere’s skies over the course of a decade-long project. Equipped with a 3,200-megapixel camera, the telescope will photograph an area the size of 40 full moons every night from its perch atop a Chilean mountain. That means it can capture the entire visible sky every three to four nights. When operational, the Rubin Observatory will help astronomers inventory the solar system, map the Milky Way, and shed light on dark matter and dark energy.
Hailing Air Taxis at the Olympics
At this year’s summer Olympic Games in Paris, attendees may be able to take an electric vertical-take-off-and-landing vehicle, or eVTOL, to get around the city. Volocopter, in Bruchsal, Germany, hopes to make an air taxi service available to sports enthusiasts and tourists during the competition. Though the company is still awaiting certification from the European Union Aviation Safety Agency, Volocopter plans to offer three routes between various parts of the city, as well as two round-trip routes for tourists. Volocopter’s air taxis could make Paris the first European city to offer eVTOL services.
Faster Than a Speeding Bullet
Boom Technology is developing an airliner, called Overture, that flies faster than the speed of sound. The U.S. company says it’s set to finish construction of its North Carolina “superfactory” in 2024. Each year Boom plans to manufacture as many as 33 of the aircraft, which the company claims will be the world’s fastest airliner. Overture is designed to be capable of flying twice as fast as today’s commercial planes, and Boom says it expects the plane to be powered by sustainable aviation fuel, made without petroleum. The company says it already has orders in place from commercial airlines and is aiming for first flight by 2027.
This story is part of our Top Tech 2024 special report.
Journey to the Center of the EarthTo unlock the terawatt potential of geothermal energy, MIT startup Quaise Energy is testing a deep-drilling rig in 2024 that will use high-power millimeter waves to melt a column of rock down as far as 10 to 20 kilometers. Its “deeper, hotter, and faster” strategy will start with old oil-and-gas drilling structures and extend them by blasting radiation from a gyrotron to vaporize the hard rock beneath. At these depths, Earth reaches 500 °C. Accessing this superhot geothermal energy could be a key part of achieving net zero emission goals by 2050, according to Quaise executives.
“Batteries Included” Induction Ovens
Now we’re cooking with gas—but soon, we may be cooking with induction. A growing number of consumers are switching to induction-based stoves and ovens to address environmental concerns and health risks associated with gas ranges. But while these new appliances are more energy efficient, most models require modified electrical outlets and cost hundreds of dollars to install. That’s why startups like Channing Street Copper and Impulse Labs are working to make induction ovens easier to install by adding built-in batteries that supplement regular wall-socket power. Channing Street Copper plans to roll out its battery-boosted Charlie appliance in early 2024.
Triage Tech to the Rescue
In the second half of 2024, the U.S. Defense Advanced Research Projects Agency will begin the first round of its Triage Challenge, a competition to develop sensors and algorithms to support triage efforts during mass-casualty incidents. According to a DARPA video presentation from last February, the agency is seeking new ways to help medics at two stages of treatment: During primary triage, those most in need of care will be identified with sensors from afar. Then, when the patients are stable, medics can decide the best treatment regimens based on data gleaned from noninvasive sensors. The three rounds will continue through 2026, with prizes totaling US $7 million.
Killer Drones Deployed From the Skies
A new class of missile-firing drones will take to the skies in 2024. Like a three-layer aerial nesting doll, the missile-stuffed drone is itself released from the belly of a bomber while in flight. The uncrewed aircraft was developed by energy and defense company General Atomics as part of the Defense Advanced Research Projects Agency’s LongShot program and will be flight-tested this year to prove its feasibility in air-based combat. Its goal is to extend the range and effectiveness of both air-to-air missiles and the current class of fighter jets while new aircraft are introduced.
Visible’s Anti-Activity Tracker
Long COVID and chronic fatigue often go unseen by others. But it’s important that people with these invisible illnesses understand how different activities affect their symptoms so they can properly pace their days. That’s why one man with long COVID, Harry Leeming, decided to create Visible, an app that helps users monitor activity and avoid overexertion. This year, according to Leeming, Visible will launch a premium version of the app that uses a specialized heart-rate monitor. While most wearables are meant for workouts, Leeming says, these armband monitors are optimized for lower heart rates to help people with both long COVID and fatigue. The app will also collect data from consenting users to help research these conditions.
Amazon Launches New Internet Service—Literally
Amazon expects to begin providing Internet service from space with Project Kuiper by the end of 2024. The US $10 billion project aims to expand reliable broadband internet access to rural areas around the globe by launching a constellation of more than 3,000 satellites into low Earth orbit. While the project will take years to complete in full, Amazon is set to start beta testing with customers later this year. If successful, Kuiper could be integrated into the suite of Amazon Web Services. SpaceX’s Starlink, meanwhile, has been active since 2019 and already has 5,000 satellites in orbit.
Solar-Powered Test Drive
The next car you buy might be powered by the sun. Long awaited by potential customers and crowdfunders, solar electric vehicles (SEVs) made by the startup Aptera Motors are set to hit the road in 2024, the company says. Like the cooler cousin of an SUV, these three-wheeled SEVs feature a sleek, aerodynamic design to cut down on drag. The latest version of the vehicle combines plug-in capability with solar panels that cover its roof, allowing for a 1,600-kilometer range on a single charge and up to 65 km a day from solar power. Aptera says it aims to begin early production in 2024, with the first 2,000 vehicles set to be delivered to investors.
Zero Trust, Two-Thirds Confidence
“Trust but verify” is now a proverb of the past in cybersecurity policy in the United States. By the end of the 2024 fiscal year, in September, all U.S. government agencies will be required to switch to a Zero Trust security architecture. All users must validate their identity and devices—even when they’re already connected to government networks and VPNs. This is achieved with methods like multifactor authentication and other access controls. About two-thirds of security professionals employed by federal agencies are confident that their department will hit the cybersecurity deadline, according to a 2023 report.
First Light for Vera Rubin
Vera C. Rubin Observatory, home to the largest digital camera ever constructed, is expected to open its eye to the sky for the first time in late 2024. The observatory features an 8.4-meter wide-field telescope that will scan the Southern Hemisphere’s skies over the course of a decade-long project. Equipped with a 3,200-megapixel camera, the telescope will photograph an area the size of 40 full moons every night from its perch atop a Chilean mountain. That means it can capture the entire visible sky every three to four nights. When operational, the Rubin Observatory will help astronomers inventory the solar system, map the Milky Way, and shed light on dark matter and dark energy.
Hailing Air Taxis at the Olympics
At this year’s summer Olympic Games in Paris, attendees may be able to take an electric vertical-take-off-and-landing vehicle, or eVTOL, to get around the city. Volocopter, in Bruchsal, Germany, hopes to make an air taxi service available to sports enthusiasts and tourists during the competition. Though the company is still awaiting certification from the European Union Aviation Safety Agency, Volocopter plans to offer three routes between various parts of the city, as well as two round-trip routes for tourists. Volocopter’s air taxis could make Paris the first European city to offer eVTOL services.
Faster Than a Speeding Bullet
Boom Technology is developing an airliner, called Overture, that flies faster than the speed of sound. The U.S. company says it’s set to finish construction of its North Carolina “superfactory” in 2024. Each year Boom plans to manufacture as many as 33 of the aircraft, which the company claims will be the world’s fastest airliner. Overture is designed to be capable of flying twice as fast as today’s commercial planes, and Boom says it expects the plane to be powered by sustainable aviation fuel, made without petroleum. The company says it already has orders in place from commercial airlines and is aiming for first flight by 2027.
Ten years ago, at the DARPA Robotics Challenge (DRC) Trial event near Miami, I watched the most advanced humanoid robots ever built struggle their way through a scenario inspired by the Fukushima nuclear disaster. A team of experienced engineers controlled each robot, and overhead safety tethers kept them from falling over. The robots had to demonstrate mobility, sensing, and manipulation—which, with painful slowness, they did.
These robots were clearly research projects, but DARPA has a history of catalyzing technology with a long-term view. The DARPA Grand and Urban Challenges for autonomous vehicles, in 2005 and 2007, formed the foundation for today’s autonomous taxis. So, after DRC ended in 2015 with several of the robots successfully completing the entire final scenario, the obvious question was: When would humanoid robots make the transition from research project to a commercial product?
This article is part of our special report Top Tech 2024.
The answer seems to be 2024, when a handful of well-funded companies will be deploying their robots in commercial pilot projects to figure out whether humanoids are really ready to get to work.
One of the robots that made an appearance at the DRC Finals in 2015 was called ATRIAS, developed by Jonathan Hurst at the Oregon State University Dynamic Robotics Laboratory. In 2015, Hurst cofounded Agility Robotics to turn ATRIAS into a human-centric, multipurpose, and practical robot called Digit. Approximately the same size as a human, Digit stands 1.75 meters tall (about 5 feet, 8 inches), weighs 65 kilograms (about 140 pounds), and can lift 16 kg (about 35 pounds). Agility is now preparing to produce a commercial version of Digit at massive scale, and the company sees its first opportunity in the logistics industry, where it will start doing some of the jobs where humans are essentially acting like robots already.
Are humanoid robots useful?“We spent a long time working with potential customers to find a use case where our technology can provide real value, while also being scalable and profitable,” Hurst says. “For us, right now, that use case is moving e-commerce totes.” Totes are standardized containers that warehouses use to store and transport items. As items enter or leave the warehouse, empty totes need to be continuously moved from place to place. It’s a vital job, and even in highly automated warehouses, much of that job is done by humans.
Agility says that in the United States, there are currently several million people working at tote-handling tasks, and logistics companies are having trouble keeping positions filled, because in some markets there are simply not enough workers available. Furthermore, the work tends to be dull, repetitive, and stressful on the body. “The people doing these jobs are basically doing robotic jobs,” says Hurst, and Agility argues that these people would be much better off doing work that’s more suited to their strengths. “What we’re going to have is a shifting of the human workforce into a more supervisory role,” explains Damion Shelton, Agility Robotics’ CEO. “We’re trying to build something that works with people,” Hurst adds. “We want humans for their judgment, creativity, and decision-making, using our robots as tools to do their jobs faster and more efficiently.”
For Digit to be an effective warehouse tool, it has to be capable, reliable, safe, and financially sustainable for both Agility and its customers. Agility is confident that all of this is possible, citing Digit’s potential relative to the cost and performance of human workers. “What we’re encouraging people to think about,” says Shelton, “is how much they could be saving per hour by being able to allocate their human capital elsewhere in the building.” Shelton estimates that a typical large logistics company spends at least US $30 per employee-hour for labor, including benefits and overhead. The employee, of course, receives much less than that.
Agility is not yet ready to provide pricing information for Digit, but we’re told that it will cost less than $250,000 per unit. Even at that price, if Digit is able to achieve Agility’s goal of minimum 20,000 working hours (five years of two shifts of work per day), that brings the hourly rate of the robot to $12.50. A service contract would likely add a few dollars per hour to that. “You compare that against human labor doing the same task,” Shelton says, “and as long as it’s apples to apples in terms of the rate that the robot is working versus the rate that the human is working, you can decide whether it makes more sense to have the person or the robot.”
Agility’s robot won’t be able to match the general capability of a human, but that’s not the company’s goal. “Digit won’t be doing everything that a person can do,” says Hurst. “It’ll just be doing that one process-automated task,” like moving empty totes. In these tasks, Digit is able to keep up with (and in fact slightly exceed) the speed of the average human worker, when you consider that the robot doesn’t have to accommodate the needs of a frail human body.
Amazon’s experiments with warehouse robotsThe first company to put Digit to the test is Amazon. In 2022, Amazon invested in Agility as part of its Industrial Innovation Fund, and late last year Amazon started testing Digit at its robotics research and development site near Seattle, Wash. Digit will not be lonely at Amazon—the company currently has more than 750,000 robots deployed across its warehouses, including legacy systems that operate in closed-off areas as well as more modern robots that have the necessary autonomy to work more collaboratively with people. These newer robots include autonomous mobile robotic bases like Proteus, which can move carts around warehouses, as well as stationary robot arms like Sparrow and Cardinal, which can handle inventory or customer orders in structured environments. But a robot with legs will be something new.
“What’s interesting about Digit is because of its bipedal nature, it can fit in spaces a little bit differently,” says Emily Vetterick, director of engineering at Amazon Global Robotics, who is overseeing Digit’s testing. “We’re excited to be at this point with Digit where we can start testing it, because we’re going to learn where the technology makes sense.”
Where two legs make sense has been an ongoing question in robotics for decades. Obviously, in a world designed primarily for humans, a robot with a humanoid form factor would be ideal. But balancing dynamically on two legs is still difficult for robots, especially when those robots are carrying heavy objects and are expected to work at a human pace for tens of thousands of hours. When is it worthwhile to use a bipedal robot instead of something simpler?
“The people doing these jobs are basically doing robotic jobs.”—Jonathan Hurst, Agility Robotics
“The use case for Digit that I’m really excited about is empty tote recycling,” Vetterick says. “We already automate this task in a lot of our warehouses with a conveyor, a very traditional automation solution, and we wouldn’t want a robot in a place where a conveyor works. But a conveyor has a specific footprint, and it’s conducive to certain types of spaces. When we start to get away from those spaces, that’s where robots start to have a functional need to exist.”
The need for a robot doesn’t always translate into the need for a robot with legs, however, and a company like Amazon has the resources to build its warehouses to support whatever form of robotics or automation it needs. Its newer warehouses are indeed built that way, with flat floors, wide aisles, and other environmental considerations that are particularly friendly to robots with wheels.
“The building types that we’re thinking about [for Digit] aren’t our new-generation buildings. They’re older-generation buildings, where we can’t put in traditional automation solutions because there just isn’t the space for them,” says Vetterick. She describes the organized chaos of some of these older buildings as including narrower aisles with roof supports in the middle of them, and areas where pallets, cardboard, electrical cord covers, and ergonomics mats create uneven floors. “Our buildings are easy for people to navigate,” Vetterick continues. “But even small obstructions become barriers that a wheeled robot might struggle with, and where a walking robot might not.” Fundamentally, that’s the advantage bipedal robots offer relative to other form factors: They can quickly and easily fit into spaces and workflows designed for humans. Or at least, that’s the goal.
Vetterick emphasizes that the Seattle R&D site deployment is only a very small initial test of Digit’s capabilities. Having the robot move totes from a shelf to a conveyor across a flat, empty floor is not reflective of the use case that Amazon ultimately would like to explore. Amazon is not even sure that Digit will turn out to be the best tool for this particular job, and for a company so focused on efficiency, only the best solution to a specific problem will find a permanent home as part of its workflow. “Amazon isn’t interested in a general-purpose robot,” Vetterick explains. “We are always focused on what problem we’re trying to solve. I wouldn’t want to suggest that Digit is the only way to solve this type of problem. It’s one potential way that we’re interested in experimenting with.”
The idea of a general-purpose humanoid robot that can assist people with whatever tasks they may need is certainly appealing, but as Amazon makes clear, the first step for companies like Agility is to find enough value performing a single task (or perhaps a few different tasks) to achieve sustainable growth. Agility believes that Digit will be able to scale its business by solving Amazon’s empty tote-recycling problem, and the company is confident enough that it’s preparing to open a factory in Salem, Ore. At peak production the plant will eventually be capable of manufacturing 10,000 Digit robots per year.
A menagerie of humanoidsAgility is not alone in its goal to commercially deploy bipedal robots in 2024. At least seven other companies are also working toward this goal, with hundreds of millions of dollars of funding backing them. 1X, Apptronik, Figure, Sanctuary, Tesla, and Unitree all have commercial humanoid robot prototypes.
Despite an influx of money and talent into commercial humanoid robot development over the past two years, there have been no recent fundamental technological breakthroughs that will substantially aid these robots’ development. Sensors and computers are capable enough, but actuators remain complex and expensive, and batteries struggle to power bipedal robots for the length of a work shift.
There are other challenges as well, including creating a robot that’s manufacturable with a resilient supply chain and developing the service infrastructure to support a commercial deployment at scale. The biggest challenge by far is software. It’s not enough to simply build a robot that can do a job—that robot has to do the job with the kind of safety, reliability, and efficiency that will make it desirable as more than an experiment.
There’s no question that Agility Robotics and the other companies developing commercial humanoids have impressive technology, a compelling narrative, and an enormous amount of potential. Whether that potential will translate into humanoid robots in the workplace now rests with companies like Amazon, who seem cautiously optimistic. It would be a fundamental shift in how repetitive labor is done. And now, all the robots have to do is deliver.
This article appears in the January 2024 print issue as “Year of the Humanoid.”
Ten years ago, at the DARPA Robotics Challenge (DRC) Trial event near Miami, I watched the most advanced humanoid robots ever built struggle their way through a scenario inspired by the Fukushima nuclear disaster. A team of experienced engineers controlled each robot, and overhead safety tethers kept them from falling over. The robots had to demonstrate mobility, sensing, and manipulation—which, with painful slowness, they did.
These robots were clearly research projects, but DARPA has a history of catalyzing technology with a long-term view. The DARPA Grand and Urban Challenges for autonomous vehicles, in 2005 and 2007, formed the foundation for today’s autonomous taxis. So, after DRC ended in 2015 with several of the robots successfully completing the entire final scenario, the obvious question was: When would humanoid robots make the transition from research project to a commercial product?
This article is part of our special report Top Tech 2024.
The answer seems to be 2024, when a handful of well-funded companies will be deploying their robots in commercial pilot projects to figure out whether humanoids are really ready to get to work.
One of the robots that made an appearance at the DRC Finals in 2015 was called ATRIAS, developed by Jonathan Hurst at the Oregon State University Dynamic Robotics Laboratory. In 2015, Hurst cofounded Agility Robotics to turn ATRIAS into a human-centric, multipurpose, and practical robot called Digit. Approximately the same size as a human, Digit stands 1.75 meters tall (about 5 feet, 8 inches), weighs 65 kilograms (about 140 pounds), and can lift 16 kg (about 35 pounds). Agility is now preparing to produce a commercial version of Digit at massive scale, and the company sees its first opportunity in the logistics industry, where it will start doing some of the jobs where humans are essentially acting like robots already.
Are humanoid robots useful?“We spent a long time working with potential customers to find a use case where our technology can provide real value, while also being scalable and profitable,” Hurst says. “For us, right now, that use case is moving e-commerce totes.” Totes are standardized containers that warehouses use to store and transport items. As items enter or leave the warehouse, empty totes need to be continuously moved from place to place. It’s a vital job, and even in highly automated warehouses, much of that job is done by humans.
Agility says that in the United States, there are currently several million people working at tote-handling tasks, and logistics companies are having trouble keeping positions filled, because in some markets there are simply not enough workers available. Furthermore, the work tends to be dull, repetitive, and stressful on the body. “The people doing these jobs are basically doing robotic jobs,” says Hurst, and Agility argues that these people would be much better off doing work that’s more suited to their strengths. “What we’re going to have is a shifting of the human workforce into a more supervisory role,” explains Damion Shelton, Agility Robotics’ CEO. “We’re trying to build something that works with people,” Hurst adds. “We want humans for their judgment, creativity, and decision-making, using our robots as tools to do their jobs faster and more efficiently.”
For Digit to be an effective warehouse tool, it has to be capable, reliable, safe, and financially sustainable for both Agility and its customers. Agility is confident that all of this is possible, citing Digit’s potential relative to the cost and performance of human workers. “What we’re encouraging people to think about,” says Shelton, “is how much they could be saving per hour by being able to allocate their human capital elsewhere in the building.” Shelton estimates that a typical large logistics company spends at least US $30 per employee-hour for labor, including benefits and overhead. The employee, of course, receives much less than that.
Agility is not yet ready to provide pricing information for Digit, but we’re told that it will cost less than $250,000 per unit. Even at that price, if Digit is able to achieve Agility’s goal of minimum 20,000 working hours (five years of two shifts of work per day), that brings the hourly rate of the robot to $12.50. A service contract would likely add a few dollars per hour to that. “You compare that against human labor doing the same task,” Shelton says, “and as long as it’s apples to apples in terms of the rate that the robot is working versus the rate that the human is working, you can decide whether it makes more sense to have the person or the robot.”
Agility’s robot won’t be able to match the general capability of a human, but that’s not the company’s goal. “Digit won’t be doing everything that a person can do,” says Hurst. “It’ll just be doing that one process-automated task,” like moving empty totes. In these tasks, Digit is able to keep up with (and in fact slightly exceed) the speed of the average human worker, when you consider that the robot doesn’t have to accommodate the needs of a frail human body.
Amazon’s experiments with warehouse robotsThe first company to put Digit to the test is Amazon. In 2022, Amazon invested in Agility as part of its Industrial Innovation Fund, and late last year Amazon started testing Digit at its robotics research and development site near Seattle, Wash. Digit will not be lonely at Amazon—the company currently has more than 750,000 robots deployed across its warehouses, including legacy systems that operate in closed-off areas as well as more modern robots that have the necessary autonomy to work more collaboratively with people. These newer robots include autonomous mobile robotic bases like Proteus, which can move carts around warehouses, as well as stationary robot arms like Sparrow and Cardinal, which can handle inventory or customer orders in structured environments. But a robot with legs will be something new.
“What’s interesting about Digit is because of its bipedal nature, it can fit in spaces a little bit differently,” says Emily Vetterick, director of engineering at Amazon Global Robotics, who is overseeing Digit’s testing. “We’re excited to be at this point with Digit where we can start testing it, because we’re going to learn where the technology makes sense.”
Where two legs make sense has been an ongoing question in robotics for decades. Obviously, in a world designed primarily for humans, a robot with a humanoid form factor would be ideal. But balancing dynamically on two legs is still difficult for robots, especially when those robots are carrying heavy objects and are expected to work at a human pace for tens of thousands of hours. When is it worthwhile to use a bipedal robot instead of something simpler?
“The people doing these jobs are basically doing robotic jobs.”—Jonathan Hurst, Agility Robotics
“The use case for Digit that I’m really excited about is empty tote recycling,” Vetterick says. “We already automate this task in a lot of our warehouses with a conveyor, a very traditional automation solution, and we wouldn’t want a robot in a place where a conveyor works. But a conveyor has a specific footprint, and it’s conducive to certain types of spaces. When we start to get away from those spaces, that’s where robots start to have a functional need to exist.”
The need for a robot doesn’t always translate into the need for a robot with legs, however, and a company like Amazon has the resources to build its warehouses to support whatever form of robotics or automation it needs. Its newer warehouses are indeed built that way, with flat floors, wide aisles, and other environmental considerations that are particularly friendly to robots with wheels.
“The building types that we’re thinking about [for Digit] aren’t our new-generation buildings. They’re older-generation buildings, where we can’t put in traditional automation solutions because there just isn’t the space for them,” says Vetterick. She describes the organized chaos of some of these older buildings as including narrower aisles with roof supports in the middle of them, and areas where pallets, cardboard, electrical cord covers, and ergonomics mats create uneven floors. “Our buildings are easy for people to navigate,” Vetterick continues. “But even small obstructions become barriers that a wheeled robot might struggle with, and where a walking robot might not.” Fundamentally, that’s the advantage bipedal robots offer relative to other form factors: They can quickly and easily fit into spaces and workflows designed for humans. Or at least, that’s the goal.
Vetterick emphasizes that the Seattle R&D site deployment is only a very small initial test of Digit’s capabilities. Having the robot move totes from a shelf to a conveyor across a flat, empty floor is not reflective of the use case that Amazon ultimately would like to explore. Amazon is not even sure that Digit will turn out to be the best tool for this particular job, and for a company so focused on efficiency, only the best solution to a specific problem will find a permanent home as part of its workflow. “Amazon isn’t interested in a general-purpose robot,” Vetterick explains. “We are always focused on what problem we’re trying to solve. I wouldn’t want to suggest that Digit is the only way to solve this type of problem. It’s one potential way that we’re interested in experimenting with.”
The idea of a general-purpose humanoid robot that can assist people with whatever tasks they may need is certainly appealing, but as Amazon makes clear, the first step for companies like Agility is to find enough value performing a single task (or perhaps a few different tasks) to achieve sustainable growth. Agility believes that Digit will be able to scale its business by solving Amazon’s empty tote-recycling problem, and the company is confident enough that it’s preparing to open a factory in Salem, Ore. At peak production the plant will eventually be capable of manufacturing 10,000 Digit robots per year.
A menagerie of humanoidsAgility is not alone in its goal to commercially deploy bipedal robots in 2024. At least seven other companies are also working toward this goal, with hundreds of millions of dollars of funding backing them. 1X, Apptronik, Figure, Sanctuary, Tesla, and Unitree all have commercial humanoid robot prototypes.
Despite an influx of money and talent into commercial humanoid robot development over the past two years, there have been no recent fundamental technological breakthroughs that will substantially aid these robots’ development. Sensors and computers are capable enough, but actuators remain complex and expensive, and batteries struggle to power bipedal robots for the length of a work shift.
There are other challenges as well, including creating a robot that’s manufacturable with a resilient supply chain and developing the service infrastructure to support a commercial deployment at scale. The biggest challenge by far is software. It’s not enough to simply build a robot that can do a job—that robot has to do the job with the kind of safety, reliability, and efficiency that will make it desirable as more than an experiment.
There’s no question that Agility Robotics and the other companies developing commercial humanoids have impressive technology, a compelling narrative, and an enormous amount of potential. Whether that potential will translate into humanoid robots in the workplace now rests with companies like Amazon, who seem cautiously optimistic. It would be a fundamental shift in how repetitive labor is done. And now, all the robots have to do is deliver.
This article appears in the January 2024 print issue as “Year of the Humanoid.”