Consistency Regularization을 적용한 멀티모달 한국어 감정인식

김정희; 강필성

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Consistency Regularization을 적용한 멀티모달 한국어 감정인식

Full metadata record

DC Field	Value	Language
dc.contributor.author	김정희	-
dc.contributor.author	강필성	-
dc.date.accessioned	2022-11-05T07:41:32Z	-
dc.date.available	2022-11-05T07:41:32Z	-
dc.date.created	2022-11-04	-
dc.date.issued	2021	-
dc.identifier.issn	1225-0988	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/144790	-
dc.description.abstract	Recently, the demand for artificial intelligence-based voice services, identifying and appropriately responding to user needs based on voice, is increasing. In particular, technology for recognizing emotions, which is non-verbal information of human voice, is receiving significant attention to improve the quality of voice services. Therefore, speech emotion recognition models based on deep learning is actively studied with rich English data, and a multi-modal emotion recognition framework with a speech recognition module has been proposed to utilize both voice and text information. However, the framework with speech recognition module has a disadvantage in an actual environment where ambient noise exists. The performance of the framework decreases along with the decrease of the speech recognition rate. In addition, it is challenging to apply deep learning-based models to Korean emotion recognition because, unlike English, emotion data is not abundant. To address the drawback of the framework, we propose a consistency regularization learning methodology that can reflect the difference between the content of speech and the text extracted from the speech recognition module in the model. We also adapt pre-trained models with self-supervised way such as Wav2vec 2.0 and HanBERT to the framework, considering limited Korean emotion data. Our experimental results show that the framework with pre-trained models yields better performance than a model trained with only speech on Korean multi-modal emotion dataset. The proposed learning methodology can minimize the performance degradation with poor performing speech recognition modules.	-
dc.language	Korean	-
dc.language.iso	ko	-
dc.publisher	대한산업공학회	-
dc.title	Consistency Regularization을 적용한 멀티모달 한국어 감정인식	-
dc.title.alternative	Multi-modal Korean Emotion Recognition with Consistency Regularization	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	강필성	-
dc.identifier.bibliographicCitation	대한산업공학회지, v.47, no.6, pp.549 - 559	-
dc.relation.isPartOf	대한산업공학회지	-
dc.citation.title	대한산업공학회지	-
dc.citation.volume	47	-
dc.citation.number	6	-
dc.citation.startPage	549	-
dc.citation.endPage	559	-
dc.type.rims	ART	-
dc.identifier.kciid	ART002785125	-
dc.description.journalClass	2	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	Speech Emotion Recognition	-
dc.subject.keywordAuthor	Wav2vec 2.0	-
dc.subject.keywordAuthor	Multi-Modal Emotion Recognition	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Industrial and Management Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kang, Pil sung photo

Kang, Pil sung: 공과대학 (School of Industrial and Management Engineering)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,023,421; Today View :654

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE