Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification

Lee, Sanghyun; Han, David K.; Ko, Hanseok

doi:10.1109/ACCESS.2021.3092735

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Sanghyun	-
dc.contributor.author	Han, David K.	-
dc.contributor.author	Ko, Hanseok	-
dc.date.accessioned	2021-12-07T10:41:47Z	-
dc.date.available	2021-12-07T10:41:47Z	-
dc.date.created	2021-08-30	-
dc.date.issued	2021-06	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/130068	-
dc.description.abstract	Human communication includes rich emotional content, thus the development of multimodal emotion recognition plays an important role in communication between humans and computers. Because of the complex emotional characteristics of a speaker, emotional recognition remains a challenge, particularly in capturing emotional cues across a variety of modalities, such as speech, facial expressions, and language. Audio and visual cues are particularly vital for a human observer in understanding emotions. However, most previous work on emotion recognition has been based solely on linguistic information, which can overlook various forms of nonverbal information. In this paper, we present a new multimodal emotion recognition approach that improves the BERT model for emotion recognition by combining it with heterogeneous features based on language, audio, and visual modalities. Specifically, we improve the BERT model due to the heterogeneous features of the audio and visual modalities. We introduce the Self-Multi-Attention Fusion module, Multi-Attention fusion module, and Video Fusion module, which are attention based multimodal fusion mechanisms using the recently proposed transformer architecture. We explore the optimal ways to combine fine-grained representations of audio and visual features into a common embedding while combining a pre-trained BERT model with modalities for fine-tuning. In our experiment, we evaluate the commonly used CMU-MOSI, CMU-MOSEI, and IEMOCAP datasets for multimodal sentiment analysis. Ablation analysis indicates that the audio and visual components make a significant contribution to the recognition results, suggesting that these modalities contain highly complementary information for sentiment analysis based on video input. Our method shows that we achieve state-of-the-art performance on the CMU-MOSI, CMU-MOSEI, and IEMOCAP dataset.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Ko, Hanseok	-
dc.identifier.doi	10.1109/ACCESS.2021.3092735	-
dc.identifier.scopusid	2-s2.0-85112214088	-
dc.identifier.wosid	000674231500001	-
dc.identifier.bibliographicCitation	IEEE ACCESS, v.9, pp.94557 - 94572	-
dc.relation.isPartOf	IEEE ACCESS	-
dc.citation.title	IEEE ACCESS	-
dc.citation.volume	9	-
dc.citation.startPage	94557	-
dc.citation.endPage	94572	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordPlus	SPEECH	-
dc.subject.keywordAuthor	BERT	-
dc.subject.keywordAuthor	Bit error rate	-
dc.subject.keywordAuthor	Computer architecture	-
dc.subject.keywordAuthor	Deep learning	-
dc.subject.keywordAuthor	Emotion recognition	-
dc.subject.keywordAuthor	Feature extraction	-
dc.subject.keywordAuthor	Multimodal emotion recognition	-
dc.subject.keywordAuthor	Sentiment analysis	-
dc.subject.keywordAuthor	Visualization	-
dc.subject.keywordAuthor	attention based multimodal	-
dc.subject.keywordAuthor	heterogeneous features	-
dc.subject.keywordAuthor	transformer	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Ko, Han seok photo

Ko, Han seok: College of Engineering (School of Electrical Engineering)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,174,069; Today View :14,734

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE