22nd AIAI 2026, 16 - 19 July 2026, Chania, Crete, Greece

PRISM-MOT: A Compact Survey of Multi-Object Tracking through Classical, Transformer, and Foundation-Model Lenses

Ullah Mohib, Afridi Hina, Yamin Saira, Ullah Habib

Abstract:

  Multi-object tracking (MOT) has evolved from a predominantly detector-and-association problem to a general problem of temporal perception that involves memory, segmentation, open-vocabulary reasoning, and large-scale pretraining. Previous surveys formalised the classical taxonomy of motion, appearance, interaction, and inference cues. They focused on deep tracking systems and established benchmarks like MOTChallenge. Our work provides MOT basics while providing a compact but up-to-date snapshot of the field. We briefly review the standard tracking-by-detection formulation and representative metrics still used to organise evaluation. We summarise the lines of architectural development, highlight the emergence of end-to-end transformer trackers, and discuss the recent shift toward foundation model driven tracking, both through segmentation-centric pipelines, and via open vocabulary methods. We then discuss how newer benchmarks changed the research agenda, and how new metrics matter alongside legacy ones. We include a simple mathematical formulation that connects the classical assignment view of MOT to learned identity decoding. The paper concludes with a discussion of open problems in generalisation, multi-camera and 3D reasoning, uncertainty calibration, and semantic/language-aware tracking. Our work is not exhaustive review but rather provides a compact insight, while still capturing the centre of gravity of MOT research today.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.