Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Image classification and captioning model considering a CAM-based disagreement loss

Full metadata record
DC Field Value Language
dc.contributor.authorYoon, Yeo Chan-
dc.contributor.authorPark, So Young-
dc.contributor.authorPark, Soo Myoung-
dc.contributor.authorLim, Heuiseok-
dc.date.accessioned2021-08-31T11:01:40Z-
dc.date.available2021-08-31T11:01:40Z-
dc.date.created2021-06-19-
dc.date.issued2020-02-
dc.identifier.issn1225-6463-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/57765-
dc.description.abstractImage captioning has received significant interest in recent years, and notable results have been achieved. Most previous approaches have focused on generating visual descriptions from images, whereas a few approaches have exploited visual descriptions for image classification. This study demonstrates that a good performance can be achieved for both description generation and image classification through an end-to-end joint learning approach with a loss function, which encourages each task to reach a consensus. When given images and visual descriptions, the proposed model learns a multimodal intermediate embedding, which can represent both the textual and visual characteristics of an object. The performance can be improved for both tasks by sharing the multimodal embedding. Through a novel loss function based on class activation mapping, which localizes the discriminative image region of a model, we achieve a higher score when the captioning and classification model reaches a consensus on the key parts of the object. Using the proposed model, we established a substantially improved performance for each task on the UCSD Birds and Oxford Flowers datasets.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherWILEY-
dc.titleImage classification and captioning model considering a CAM-based disagreement loss-
dc.typeArticle-
dc.contributor.affiliatedAuthorLim, Heuiseok-
dc.identifier.doi10.4218/etrij.2018-0621-
dc.identifier.scopusid2-s2.0-85079077761-
dc.identifier.wosid000479559800001-
dc.identifier.bibliographicCitationETRI JOURNAL, v.42, no.1, pp.67 - 77-
dc.relation.isPartOfETRI JOURNAL-
dc.citation.titleETRI JOURNAL-
dc.citation.volume42-
dc.citation.number1-
dc.citation.startPage67-
dc.citation.endPage77-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.identifier.kciidART002556918-
dc.description.journalClass1-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.description.journalRegisteredClasskci-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordAuthordeep learning-
dc.subject.keywordAuthorimage captioning-
dc.subject.keywordAuthorimage classification-
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE