|
Visual scene understanding is a fundamental problem and a complex task in computer vision, which not only requires identifying objects in isolation, but also the ability to understand and recognize the relationships between them. These relationships can be abstracted into a semantic representation of , resulting in a scene graph that captures much of the visual information and semantics in the scene.In recent years, scene graph generation with message-passing mechanism has been an active area of research, as it has the potential to capture global dependencies between objects and their relationships. Inspired by these developments, this paper introduces a novel scene graph generation approach based on spatial relationships. Our approach performs a classification of the spatial relationship between each pair of objects to generate the initial scene graph. Then, based on the semantic features, the model detects action relationships in the scene and updates the scene graph by applying the message-passing mechanism.We conclude this paper by comparing the proposed method with the state-of-the-art approaches and demonstrate the effectiveness of our method over the Visual Genome dataset.
|
|