Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Visual question answering based on local-scene-aware referring expression generation

Full metadata record
DC Field Value Language
dc.contributor.authorKim, J.-J.-
dc.contributor.authorLee, D.-G.-
dc.contributor.authorWu, J.-
dc.contributor.authorJung, H.-G.-
dc.contributor.authorLee, S.-W.-
dc.date.accessioned2021-12-02T00:41:33Z-
dc.date.available2021-12-02T00:41:33Z-
dc.date.created2021-08-31-
dc.date.issued2021-07-
dc.identifier.issn0893-6080-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/128760-
dc.description.abstractVisual question answering requires a deep understanding of both images and natural language. However, most methods mainly focus on visual concept; such as the relationships between various objects. The limited use of object categories combined with their relationships or simple question embedding is insufficient for representing complex scenes and explaining decisions. To address this limitation, we propose the use of text expressions generated for images, because such expressions have few structural constraints and can provide richer descriptions of images. The generated expressions can be incorporated with visual features and question embedding to obtain the question-relevant answer. A joint-embedding multi-head attention network is also proposed to model three different information modalities with co-attention. We quantitatively and qualitatively evaluated the proposed method on the VQA v2 dataset and compared it with state-of-the-art methods in terms of answer prediction. The quality of the generated expressions was also evaluated on the RefCOCO, RefCOCO+, and RefCOCOg datasets. Experimental results demonstrate the effectiveness of the proposed method and reveal that it outperformed all of the competing methods in terms of both quantitative and qualitative results. © 2021 Elsevier Ltd-
dc.languageEnglish-
dc.language.isoen-
dc.publisherElsevier Ltd-
dc.subjectNatural language processing systems-
dc.subjectQuality control-
dc.subjectVisual languages-
dc.subjectJoint-embedding multi-head attention-
dc.subjectNatural languages-
dc.subjectObject categories-
dc.subjectQuestion Answering-
dc.subjectQuestion-embedding-
dc.subjectReferring expression generation-
dc.subjectReferring expressions-
dc.subjectSimple++-
dc.subjectVisual concept-
dc.subjectVisual question answering-
dc.subjectEmbeddings-
dc.subjectarticle-
dc.subjectattention network-
dc.subjectembedding-
dc.subjecthuman-
dc.subjecthuman experiment-
dc.subjectprediction-
dc.subjectquantitative analysis-
dc.titleVisual question answering based on local-scene-aware referring expression generation-
dc.typeArticle-
dc.contributor.affiliatedAuthorLee, S.-W.-
dc.identifier.doi10.1016/j.neunet.2021.02.001-
dc.identifier.scopusid2-s2.0-85102406840-
dc.identifier.wosid000652750100013-
dc.identifier.bibliographicCitationNeural Networks, v.139, pp.158 - 167-
dc.relation.isPartOfNeural Networks-
dc.citation.titleNeural Networks-
dc.citation.volume139-
dc.citation.startPage158-
dc.citation.endPage167-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaNeurosciences & Neurology-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryNeurosciences-
dc.subject.keywordPlusNatural language processing systems-
dc.subject.keywordPlusQuality control-
dc.subject.keywordPlusVisual languages-
dc.subject.keywordPlusJoint-embedding multi-head attention-
dc.subject.keywordPlusNatural languages-
dc.subject.keywordPlusObject categories-
dc.subject.keywordPlusQuestion Answering-
dc.subject.keywordPlusQuestion-embedding-
dc.subject.keywordPlusReferring expression generation-
dc.subject.keywordPlusReferring expressions-
dc.subject.keywordPlusSimple++-
dc.subject.keywordPlusVisual concept-
dc.subject.keywordPlusVisual question answering-
dc.subject.keywordPlusEmbeddings-
dc.subject.keywordPlusarticle-
dc.subject.keywordPlusattention network-
dc.subject.keywordPlusembedding-
dc.subject.keywordPlushuman-
dc.subject.keywordPlushuman experiment-
dc.subject.keywordPlusprediction-
dc.subject.keywordPlusquantitative analysis-
dc.subject.keywordAuthorJoint-embedding multi-head attention-
dc.subject.keywordAuthorReferring expression generation-
dc.subject.keywordAuthorVisual question answering-
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Artificial Intelligence > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lee, Seong Whan photo

Lee, Seong Whan
인공지능학과
Read more

Altmetrics

Total Views & Downloads

BROWSE