Multi-Modal Recurrent Attention Networks for Facial Expression Recognition

Lee, Jiyoung; Kim, Sunok; Kim, Seungryong; Sohn, Kwanghoon

doi:10.1109/TIP.2020.2996086

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Multi-Modal Recurrent Attention Networks for Facial Expression Recognition

Authors: Lee, Jiyoung; Kim, Sunok; Kim, Seungryong; Sohn, Kwanghoon

Issue Date: 2020

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: Face recognition; Image color analysis; Videos; Emotion recognition; Benchmark testing; Databases; Task analysis; Multi-modal facial expression recognition; dimensional (continuous) emotion recognition; attention mechanism

Citation: IEEE TRANSACTIONS ON IMAGE PROCESSING, v.29, pp.6977 - 6991

Indexed: SCIE
SCOPUS

Journal Title: IEEE TRANSACTIONS ON IMAGE PROCESSING

Volume: 29

Start Page: 6977

End Page: 6991

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/59019

DOI: 10.1109/TIP.2020.2996086

ISSN: 1057-7149

Abstract: Recent deep neural networks based methods have achieved state-of-the-art performance on various facial expression recognition tasks. Despite such progress, previous researches for facial expression recognition have mainly focused on analyzing color recording videos only. However, the complex emotions expressed by people with different skin colors under different lighting conditions through dynamic facial expressions can be fully understandable by integrating information from multi-modal videos. We present a novel method to estimate dimensional emotion states, where color, depth, and thermal recording videos are used as a multi-modal input. Our networks, called multi-modal recurrent attention networks (MRAN), learn spatiotemporal attention volumes to robustly recognize the facial expression based on attention-boosted feature volumes. We leverage the depth and thermal sequences as guidance priors for color sequence to selectively focus on emotional discriminative regions. We also introduce a novel benchmark for multi-modal facial expression recognition, termed as multi-modal arousal-valence facial expression recognition (MAVFER), which consists of color, depth, and thermal recording videos with corresponding continuous arousal-valence scores. The experimental results show that our method can achieve the state-of-the-art results in dimensional facial expression recognition on color recording datasets including RECOLA, SEWA and AFEW, and a multi-modal recording dataset including MAVFER.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :9,537,565; Today View :1,979

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE