Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

Mitra, Vikramjit; Sivaraman, Ganesh; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Tiede, Mark

doi:10.1016/j.specom.2017.03.003

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

Full metadata record

DC Field	Value	Language
dc.contributor.author	Mitra, Vikramjit	-
dc.contributor.author	Sivaraman, Ganesh	-
dc.contributor.author	Nam, Hosung	-
dc.contributor.author	Espy-Wilson, Carol	-
dc.contributor.author	Saltzman, Elliot	-
dc.contributor.author	Tiede, Mark	-
dc.date.accessioned	2021-09-03T06:58:06Z	-
dc.date.available	2021-09-03T06:58:06Z	-
dc.date.created	2021-06-16	-
dc.date.issued	2017-05	-
dc.identifier.issn	0167-6393	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/83685	-
dc.description.abstract	Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory and acoustic space increase complexity of speech-to-articulatory mapping, which is already an ill-posed problem due to its inherent nonlinearity and non unique nature. This work explores using deep neural networks (DNNs) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space. Our speech-inversion results indicate that the CNN models perform better than their DNN counterparts. In addition, we use these inverse-models to generate articulatory information from speech for two separate speech recognition tasks: the WSJ1 and Aurora-4 continuous speech recognition tasks. This work proposes a hybrid convolutional neural network (HCNN), where two parallel layers are used to jointly model the acoustic and articulatory spaces, and the decisions from the parallel layers are fused at the output context-dependent (CD) state level. The acoustic model performs time-frequency convolution on filterbank-energy-level features, whereas the articulatory model performs time convolution on the articulatory features. The performance of the proposed architecture is compared to that of the CNN- and DNN-based systems using gammatone filterbank energies as acoustic features, and the results indicate that the HCNN-based model demonstrates lower word error rates compared to the CNN/DNN baseline systems. (C) 2017 Elsevier B.V. All rights reserved.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	ELSEVIER SCIENCE BV	-
dc.subject	ROBUST FEATURES	-
dc.subject	MODEL	-
dc.title	Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Nam, Hosung	-
dc.identifier.doi	10.1016/j.specom.2017.03.003	-
dc.identifier.scopusid	2-s2.0-85017178582	-
dc.identifier.wosid	000401211200010	-
dc.identifier.bibliographicCitation	SPEECH COMMUNICATION, v.89, pp.103 - 112	-
dc.relation.isPartOf	SPEECH COMMUNICATION	-
dc.citation.title	SPEECH COMMUNICATION	-
dc.citation.volume	89	-
dc.citation.startPage	103	-
dc.citation.endPage	112	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Acoustics	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Acoustics	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.subject.keywordPlus	ROBUST FEATURES	-
dc.subject.keywordPlus	MODEL	-
dc.subject.keywordAuthor	Automatic speech recognition	-
dc.subject.keywordAuthor	Articulatory trajectories	-
dc.subject.keywordAuthor	Vocal tract variables	-
dc.subject.keywordAuthor	Hybrid convolutional neural networks	-
dc.subject.keywordAuthor	Time-frequency convolution	-
dc.subject.keywordAuthor	Convolutional neural networks	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung: 문과대학 (영어영문학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,800,125; Today View :16,744

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE