PinterNet: A thematic label curation tool for large image datasets


Recent progress in big data and computer vision with deep learning models has gained a lot of attention. Deep learning has been performed on tasks such as image classification, object detection, image segmentation, image captioning, visual question and answering, using large collections of annotated images. This calls for more curated large image datasets with clearer descriptions, cleaner contents, and diversified usability. However, the curation and labeling of such datasets can be labor-intensive. In this paper, we present PinterNet, an algorithm for automatic curation and label generation from noisy textual descriptions, and also publish a big image dataset containing over 110K images automatically labeled with their themes. Our dataset is hierarchical in nature, it has high level category information which we refer as verticals with fine-grained thematic labels at lower level. This advocates a new type of hierarchical theme classification problem closer to human cognition and of business value. We provide benchmark performances using deep learning models based on AlexNet architecture with different pre-training schemes for this novel task and new data.

In IEEE International Conference on Big Data (Big Data) 2016.
@inproceedings{liu2016pinternet, title={PinterNet: A thematic label curation tool for large image datasets}, author={Liu, Ruoqian and Palsetia, Diana and Paul, Arindam and Al-Bahrani, Reda and Jha, Dipendra and Liao, Wei-keng and Agrawal, Ankit and Choudhary, Alok}, booktitle={2016 IEEE International Conference on Big Data (Big Data)}, pages={2353–2362}, year={2016}, organization={IEEE}}