Zero-Shot Object Detection: Learning to
Simultaneously Recognize and Localize Novel Concepts

Shafin Rahman
Salman Khan
Fatih Porikli

Australian National University, Data61-CSIRO
[Paper + Supplementary Material]

Zero-shot detection deals with a more complex label space (object labels and locations) with considerably less supervision (i.e., no examples of unseen classes). (a) Traditional recognition task only predicts seen class labels. (b) Traditional detection task predicts both seen class labels and bounding boxes. (c) Traditional zero-shot recognition task only predicts unseen class labels. (d) The proposed ZSD predicts both seen and unseen classes and their bounding boxes.

Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the ‘recognition’ and ‘localization’ of an unseen category. To address this limitation, we introduce a new ‘Zero-Shot Detection’ (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem.




S. Rahman, S. H. Khan, F. Porikli.
Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts.
Asian Conference on Computer Vision (ACCV), Perth, December 2018.
[Link to Paper]


Our framework allows us to detect both seen and unseen object cateogries in natural images. Below we show some sample results on ILSVRC dataset where our method was able to correctly detect 'unseen' objects.

All of the detected classes were not seen by the model during training.


S. Rahman, S. H. Khan and F. Porikli, “Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts,” Asian Conference on Computer Vision (ACCV), Perth, 2018.

title={Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts},
author={Rahman, Shafin and Khan, Salman and Porikli, Fatih},
journal={Asian Conference on Computer Vision (ACCV)},
publisher={LNCS, Springer},
year={2018} }

Followup Work

Please check out our new work titled "Polar Loss for Zero-shot Object Detection".


This project page template was modified from this webpage.