Image classification and captioning model considering a CAM-based disagreement loss
- Authors
- Yoon, Yeo Chan; Park, So Young; Park, Soo Myoung; Lim, Heuiseok
- Issue Date
- 2월-2020
- Publisher
- WILEY
- Keywords
- deep learning; image captioning; image classification
- Citation
- ETRI JOURNAL, v.42, no.1, pp.67 - 77
- Indexed
- SCIE
SCOPUS
KCI
- Journal Title
- ETRI JOURNAL
- Volume
- 42
- Number
- 1
- Start Page
- 67
- End Page
- 77
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/57765
- DOI
- 10.4218/etrij.2018-0621
- ISSN
- 1225-6463
- Abstract
- Image captioning has received significant interest in recent years, and notable results have been achieved. Most previous approaches have focused on generating visual descriptions from images, whereas a few approaches have exploited visual descriptions for image classification. This study demonstrates that a good performance can be achieved for both description generation and image classification through an end-to-end joint learning approach with a loss function, which encourages each task to reach a consensus. When given images and visual descriptions, the proposed model learns a multimodal intermediate embedding, which can represent both the textual and visual characteristics of an object. The performance can be improved for both tasks by sharing the multimodal embedding. Through a novel loss function based on class activation mapping, which localizes the discriminative image region of a model, we achieve a higher score when the captioning and classification model reaches a consensus on the key parts of the object. Using the proposed model, we established a substantially improved performance for each task on the UCSD Birds and Oxford Flowers datasets.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Graduate School > Department of Computer Science and Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.