Context is fundamental to biological and computer vision. In this work, the authors introduce a new out-of-context dataset (OCD) with fine-grained control over scene context. This dataset is evaluated through psychophysics experiments in humans and also through state-of-the-art computer vision architectures. The authors also introduce a new context-aware recognition transformer model (CRTNet) to reason about context in visual scenes.
See paper by Bomatter et al ICCV 2021
See also work on contextual reasoning by Zhang et al CVPR 2020