Gupta SK, Zhang M, Wu CC, Wolfe JM, Kreiman G (2021). Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases. NeurIPS arXiv 2106.02953 PDF | Supplementary Material | Resources

Gupta SK, Zhang M, Wu CC, Wolfe JM, Kreiman G (2021). Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases. NeurIPS arXiv 2106.02953 PDF | Supplementary Material | Resources
Context is fundamental to biological and computer vision. In this work, the authors introduce a new out-of-context dataset (OCD) with fine-grained control over scene context. This dataset is evaluated through psychophysics experiments in humans and also through state-of-the-art computer vision architectures. The authors also introduce a new context-aware recognition transformer model (CRTNet) to reason about context in visual scenes.
See paper by Bomatter et al ICCV 2021
See also work on contextual reasoning by Zhang et al CVPR 2020
An integrated computational model of visual search combining eccentricity, bottom-up and top-down cues. India Institute of Technology Kanpur (2021).
Read his thesis here
Read his NeurIPS 2021 paper related to his thesis work. Visual search asymmetry: DeepNets and Humans share similar inherent biases. Gupta et al, NeurIPS 2021
Finding any Waldo: Zero-shot invariant visual search. Gabriel Kreiman @ Systems Neuroscience Club.
Frontiers in Perception Science
2011. Special Issue on “The timing of visual recognition”
Edited by Rufin VanRullen and Gabriel Kreiman
In a small fraction of a second, we can recognize objects in complex scenes in spite of significant transformations in the objects themselves and other parts of the image. Achieving a high degree of selectivity, tolerance and speed in visual recognition remains a challenging problem for engineering and computational approaches to vision. The dynamics of visual recognition has played a significant role in shaping and constraining theoretical and experimental approaches to studying visual recognition. Converging evidence from neurophysiological recordings, scalp electroencephalography and psychophysics suggests that a lot of the magic in recognition happens in the initial “glimpse” during the first 100-200 ms after presentation of or saccade to a complex scene. This has prompted many authors to propose that the initial aspects of visual recognition depend on a largely feed-forward processing mode. At the same time, at the anatomical level, we know that there are massive back-projections throughout visual cortex which might also play a critical role in the recognition process. Understanding the circuits and computations involved in rapid visual recognition represents a central model towards quantitatively characterizing the function of neocortex. The Research Topic will bring together the leading scientists that have contributed to this field over the last 20 years, to create a collection that will serve as a reference for future generations of students and researchers.