Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

Authors
Sivaraman, GaneshMitra, VikramjitNam, HosungTiede, MarkEspy-Wilson, Carol
Issue Date
7월-2019
Publisher
ACOUSTICAL SOC AMER AMER INST PHYSICS
Citation
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, v.146, no.1, pp.316 - 329
Indexed
SCIE
SCOPUS
Journal Title
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
Volume
146
Number
1
Start Page
316
End Page
329
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/64634
DOI
10.1121/1.5116130
ISSN
0001-4966
Abstract
Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. Normalizing the speaker differences is essential to effectively using multi-speaker articulatory data for training a speaker independent speech inversion system. This paper explores a vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized. The speaker normalized features are then used to train a deep feed-forward neural network based speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients. The articulatory features are represented by six tract-variable (TV) trajectories, which are relatively speaker invariant compared to flesh point data. Experiments are performed with ten speakers from the University of Wisconsin X-ray microbeam database. Results show that the proposed speaker normalization approach provides an 8.15% relative improvement in correlation between actual and estimated TVs as compared to the system where speaker normalization was not performed. To determine the efficacy of the method across datasets, cross speaker evaluations were performed across speakers from the Multichannel Articulatory-TIMIT and EMA-IEEE datasets. Results prove that the VTLN approach provides improvement in performance even across datasets. (C) 2019 Acoustical Society of America.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung
문과대학 (영어영문학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE