Image classification and captioning model considering a CAM-based disagreement loss

Yoon, Yeo Chan; Park, So Young; Park, Soo Myoung; Lim, Heuiseok

doi:10.4218/etrij.2018-0621

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Image classification and captioning model considering a CAM-based disagreement loss

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yoon, Yeo Chan	-
dc.contributor.author	Park, So Young	-
dc.contributor.author	Park, Soo Myoung	-
dc.contributor.author	Lim, Heuiseok	-
dc.date.accessioned	2021-08-31T11:01:40Z	-
dc.date.available	2021-08-31T11:01:40Z	-
dc.date.created	2021-06-19	-
dc.date.issued	2020-02	-
dc.identifier.issn	1225-6463	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/57765	-
dc.description.abstract	Image captioning has received significant interest in recent years, and notable results have been achieved. Most previous approaches have focused on generating visual descriptions from images, whereas a few approaches have exploited visual descriptions for image classification. This study demonstrates that a good performance can be achieved for both description generation and image classification through an end-to-end joint learning approach with a loss function, which encourages each task to reach a consensus. When given images and visual descriptions, the proposed model learns a multimodal intermediate embedding, which can represent both the textual and visual characteristics of an object. The performance can be improved for both tasks by sharing the multimodal embedding. Through a novel loss function based on class activation mapping, which localizes the discriminative image region of a model, we achieve a higher score when the captioning and classification model reaches a consensus on the key parts of the object. Using the proposed model, we established a substantially improved performance for each task on the UCSD Birds and Oxford Flowers datasets.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	WILEY	-
dc.title	Image classification and captioning model considering a CAM-based disagreement loss	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Lim, Heuiseok	-
dc.identifier.doi	10.4218/etrij.2018-0621	-
dc.identifier.scopusid	2-s2.0-85079077761	-
dc.identifier.wosid	000479559800001	-
dc.identifier.bibliographicCitation	ETRI JOURNAL, v.42, no.1, pp.67 - 77	-
dc.relation.isPartOf	ETRI JOURNAL	-
dc.citation.title	ETRI JOURNAL	-
dc.citation.volume	42	-
dc.citation.number	1	-
dc.citation.startPage	67	-
dc.citation.endPage	77	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.identifier.kciid	ART002556918	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.description.journalRegisteredClass	kci	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordAuthor	deep learning	-
dc.subject.keywordAuthor	image captioning	-
dc.subject.keywordAuthor	image classification	-

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,717,184; Today View :111

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE