What does a platypus look like? Generating customized prompts for zero-shot image classification

TL;DR

Why prompt engineering yourself when GPT-3 can help

Abstract

Open vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without using explicit knowledge of the image domain and with far fewer hand-constructed sentences. To achieve this, we combine open vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that are customized for each object category. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this method requires no additional training and remains completely zero-shot.

Venue
The IEEE/CVF International Conference on Computer Vision (ICCV), 2023
BibTeX
@article{pratt2022does,
  title={What does a platypus look like? Generating customized prompts for zero-shot image classification},
  author={Pratt, Sarah and Liu, Rosanne and Farhadi, Ali},
  journal={arXiv preprint arXiv:2209.03320},
  year={2022}
}
Date