Adrian et al. published their work on multi-agent digital twins. This work surpasses the state of the art in semantic state estimation by using a simple but efficient trick.

Instead of tuning a single simulation to simulate all aspects of interaction with the environment, different models are assigned to different aspects and managed in a model ensemble.

Abstract: To become helpful assistants in our daily lives, robots must be able to understand the effects of their actions on their environment. A modern approach to this is the use of a physics simulation, where often very general simulation engines are utilized. As a result, specific modeling features, such as multi-contact simulation or fluid dynamics, may not be well represented. To improve the representativeness of simulations, we propose a framework for combining estimations of multiple heterogeneous simulations into a single one. The framework couples multiple simulations and reorganizes them based on semantically annotated action sequence information. While each object in the scene is always covered by a simulation, this simulation responsibility can be reassigned on-line. In this paper, we introduce the concept of the framework, describe the architecture, and demonstrate two example implementations. Eventually, we demonstrate how the framework can be used to simulate action executions on the humanoid robot Rollin’ Justin with the goal to extract the semantic state and how this information is used to assess whether an action sequence is executed successful or not.