Detailed Information

Cited 3 time in webofscience Cited 4 time in scopus
Metadata Downloads

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network

Authors
Farooq, MisbahHussain, FawadBaloch, Naveed KhanRaja, Fawad RiasatYu, HeejungZikria, Yousaf Bin
Issue Date
Nov-2020
Publisher
MDPI
Keywords
speech emotion recognition; deep convolutional neural network; correlation-based feature selection
Citation
SENSORS, v.20, no.21
Indexed
SCIE
SCOPUS
Journal Title
SENSORS
Volume
20
Number
21
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/51984
DOI
10.3390/s20216008
ISSN
1424-8220
Abstract
Speech emotion recognition (SER) plays a significant role in human-machine interaction. Emotion recognition from speech and its precise classification is a challenging task because a machine is unable to understand its context. For an accurate emotion classification, emotionally relevant features must be extracted from the speech data. Traditionally, handcrafted features were used for emotional classification from speech signals; however, they are not efficient enough to accurately depict the emotional states of the speaker. In this study, the benefits of a deep convolutional neural network (DCNN) for SER are explored. For this purpose, a pretrained network is used to extract features from state-of-the-art speech emotional datasets. Subsequently, a correlation-based feature selection technique is applied to the extracted features to select the most appropriate and discriminative features for SER. For the classification of emotions, we utilize support vector machines, random forests, the k-nearest neighbors algorithm, and neural network classifiers. Experiments are performed for speaker-dependent and speaker-independent SER using four publicly available datasets: the Berlin Dataset of Emotional Speech (Emo-DB), Surrey Audio Visual Expressed Emotion (SAVEE), Interactive Emotional Dyadic Motion Capture (IEMOCAP), and the Ryerson Audio Visual Dataset of Emotional Speech and Song (RAVDESS). Our proposed method achieves an accuracy of 95.10% for Emo-DB, 82.10% for SAVEE, 83.80% for IEMOCAP, and 81.30% for RAVDESS, for speaker-dependent SER experiments. Moreover, our method yields the best results for speaker-independent SER with existing handcrafted features-based SER approaches.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Electronics and Information Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE