| We present a real-time human instance segmentation (HIS) framework for event-based cameras, achieving robust live segmentation of human subjects under challenging motion-sparse conditions. Event cameras offer high temporal resolution and low latency, but their inherent sparsity—especially in static-camera setups—makes full-body segmentation difficult. To address this, we introduce nSurreal, a large-scale synthetic dataset derived from the SURREAL human motion corpus and converted into event streams with depth supervision. This dataset enables effective training of Eseg, our lightweight CNN–LSTM architecture designed to fuse spatial encoding with temporal memory. Eseg not only reconstructs coherent human contours from incomplete event streams but also generalizes to live input from Prophesee cameras, operating in real time on consumer-grade GPUs. Our results demonstrate, for the first time, accurate live human segmentation from event-based input, establishing a new direction for neuromorphic vision in robotics, surveillance, and human–computer interaction. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.