22 May 2026
Modelling and predicting the motion of soft robots remains challenging due to their infinite-dimensional and highly nonlinear dynamics. A promising direction is to learn dynamics directly from high-dimensional sensory streams. Yet, standard RGB cameras suffer from motion blur, making it challenging to capture fast transients and biasing learning toward steady-state behaviours.
In this study, EMERGE partners from the Delft University of Technology turn to event-based cameras, which provide asynchronous, high-frequency visual information better suited to capturing dynamic deformations. They propose a learning architecture that encodes two-channel event frames from a DVS sensor through a convolutional autoencoder while jointly learning a compact latent representation of the robot’s dynamics. Validated in both simulation and real-world experiments, the proposed framework predicts long-horizon soft-robot motion with high accuracy and consistency from a single initial event frame and control sequence.
Read the paper in the link below.

