Abstract
We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by ‘Grounded Theory’: a qualitative approach that enables annotation while co-evolving a shared language for fine-grained artistic style attribute description. We demonstrate several downstream tasks for StyleBabel, adapting the recent ALADIN architecture for fine-grained style similarity, to train cross-modal embeddings for: 1) free-form tag generation; 2) natural language description of artistic style; 3) fine-grained text search of style. To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and cross-modal representation learning, achieving a state of the art accuracy in fine-grained style retrieval.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2022 |
Subtitle of host publication | 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VIII |
Editors | Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner |
Place of Publication | Cham, Switzerland |
Publisher | Springer |
Pages | 219-236 |
Number of pages | 18 |
Volume | 13668 |
ISBN (Electronic) | 9783031200748 |
ISBN (Print) | 9783031200731 |
DOIs | |
Publication status | Published - 2022 |
Event | ECCV 2022: European Conference on Computer Vision (ECCV) - Expo Tel Aviv, Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022 https://eccv2022.ecva.net/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | ECCV 2022 |
---|---|
Country/Territory | Israel |
City | Tel Aviv |
Period | 23/10/22 → 27/10/22 |
Internet address |
Keywords
- Datasets and evaluation
- Image and video retrieval
- Vision + language
- Vision applications and systems