In this paper, we present a new framework that combines deep semantic segmentation with homography estimation to address challenges in racket sports court registration from broadcast videos. In particular, we deal with courts presenting the following problems: (a) brushed and occluded lines, (b) illumination variations, and (c) unknown camera parameters. Given an input frame from a broadcast video, our approach employs an encoder-decoder deep neural network to predict a precise pixel-level segmentation mask, which is then used to estimate the homography matrix between the input frame and its reference court model. For a comprehensive evaluation, we have developed two datasets for badminton and tennis that meet our specific needs. Since datasets and state-of-the-art methods with code are not publicly available, we compared our framework with a commonly handcrafted approach largely used as a baseline method in racket sports analysis. We show that our method outperforms the baseline in terms of registration accuracy and inference latency per frame. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.