TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
TL;DR
A one-head attention bottleneck makes vision-language models more interpretable and user-debuggable.
Venue
The IEEE/CVF International Conference on Computer Vision (ICCV), 2025
BibTeX
@article{rahmanzadehgervi2024tab,
title={TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models},
author={Pooyan Rahmanzadehgervi and Hung Huy Nguyen and Rosanne Liu and Long Mai and Anh Totti Nguyen},
year={2024},
eprint={2412.18675},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
title={TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models},
author={Pooyan Rahmanzadehgervi and Hung Huy Nguyen and Rosanne Liu and Long Mai and Anh Totti Nguyen},
year={2024},
eprint={2412.18675},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Date
December, 2024