Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

Sivaraman, Ganesh; Mitra, Vikramjit; Nam, Hosung; Tiede, Mark; Espy-Wilson, Carol

doi:10.1121/1.5116130

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sivaraman, Ganesh	-
dc.contributor.author	Mitra, Vikramjit	-
dc.contributor.author	Nam, Hosung	-
dc.contributor.author	Tiede, Mark	-
dc.contributor.author	Espy-Wilson, Carol	-
dc.date.accessioned	2021-09-01T13:27:09Z	-
dc.date.available	2021-09-01T13:27:09Z	-
dc.date.created	2021-06-18	-
dc.date.issued	2019-07	-
dc.identifier.issn	0001-4966	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/64634	-
dc.description.abstract	Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. Normalizing the speaker differences is essential to effectively using multi-speaker articulatory data for training a speaker independent speech inversion system. This paper explores a vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized. The speaker normalized features are then used to train a deep feed-forward neural network based speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients. The articulatory features are represented by six tract-variable (TV) trajectories, which are relatively speaker invariant compared to flesh point data. Experiments are performed with ten speakers from the University of Wisconsin X-ray microbeam database. Results show that the proposed speaker normalization approach provides an 8.15% relative improvement in correlation between actual and estimated TVs as compared to the system where speaker normalization was not performed. To determine the efficacy of the method across datasets, cross speaker evaluations were performed across speakers from the Multichannel Articulatory-TIMIT and EMA-IEEE datasets. Results prove that the VTLN approach provides improvement in performance even across datasets. (C) 2019 Acoustical Society of America.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	ACOUSTICAL SOC AMER AMER INST PHYSICS	-
dc.subject	VOCAL-TRACT	-
dc.subject	MOVEMENTS	-
dc.title	Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Nam, Hosung	-
dc.identifier.doi	10.1121/1.5116130	-
dc.identifier.scopusid	2-s2.0-85069781363	-
dc.identifier.wosid	000478628800044	-
dc.identifier.bibliographicCitation	JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, v.146, no.1, pp.316 - 329	-
dc.relation.isPartOf	JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA	-
dc.citation.title	JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA	-
dc.citation.volume	146	-
dc.citation.number	1	-
dc.citation.startPage	316	-
dc.citation.endPage	329	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Acoustics	-
dc.relation.journalResearchArea	Audiology & Speech-Language Pathology	-
dc.relation.journalWebOfScienceCategory	Acoustics	-
dc.relation.journalWebOfScienceCategory	Audiology & Speech-Language Pathology	-
dc.subject.keywordPlus	VOCAL-TRACT	-
dc.subject.keywordPlus	MOVEMENTS	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung: 문과대학 (영어영문학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :9,019,518; Today View :12,410

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE